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Foreword 



This volume contains papers selected for presentation at the 26th International 
Symposium on Mathematical Foundations of Computer Science - MFCS 2001, 
held in Marianske Lazne, Czech Republic, August 27 - 31, 2001. 

MFCS 2001 was organized by the Mathematical Institute (Academy of Sci- 
ences of the Czech Republic), the Institute for Theoretical Computer Science 
(Charles University, Faculty of Mathematics and Physics), the Institute of Com- 
puter Science (Academy of Sciences of the Czech Republic), and Action M 
Agency. It was supported by the European Research Consortium for Informatics 
and Mathematics, the Czech Research Consortium for Informatics and Math- 
ematics, and the European Association for Theoretical Computer Science. We 
gratefully acknowledge the support of all these institutions. 

The series of MFCS symposia, organized on a rotating basis in Poland, Slo- 
vakia, and the Czech Republic, has a well-established tradition. The aim is to 
encourage high-quality research in all branches of theoretical computer science 
and bring together specialists who do not usually meet at specialized confer- 
ences. Previous meetings took place in Jablonna, 1972; Strbske Pleso, 1973; Jad- 
wisin, 1974; Marianske Lazne, 1975; Gdansk, 1976; Tatranska Lomnica, 1977; Za- 
kopane, 1978; Olomouc, 1979; Rydzina, 1980; Strbske Pleso, 1981; Prague, 1984; 
Bratislava, 1986; Karlovy Vary, 1988; Por^bka-Kozubnik, 1989; Banska Bystrica, 
1990; Kazimierz Dolny, 1991; Prague, 1992; Gdansk, 1993; Kosice, 1994; Prague, 
1995; Krakow, 1996; Bratislava, 1997; Brno, 1998; Szklarska Por§ba, 1999; and 
Bratislava, 2000. 

It is our pleasure to announce that at the opening of MFCS 2001, Dana 
Scott (Carnegie-Mellon Univ., Pittsburg, PA, U.S.A.) was awarded the Bolzano 
Honorary Medal of the Academy of Sciences of the Czech Republic for his con- 
tribution to the development of theoretical computer science and mathematics 
in general and his cooperation with Czech scientists in particular. 

The MFCS 2001 proceedings consist of 10 invited papers and 51 contributed 
papers. We are grateful to all the invited speakers for accepting our invitation 
and sharing their insights on their research areas. We thank the authors of all 
submissions for their contribution to the scientific program of the meeting. 

The contributed papers were selected by the Program Committee out of 
a total of 118 submissions. All submissions were evaluated by three or four 
members of the committee, with the assistance of referees, for a total of more 
than 450 reports. After electronic discussions, the final agreement was reached 
at the selection meeting in Prague on May 11-12, 2001 (the program committee 
members denoted by * in the list below took part in the meeting) . We thank all 
the program committee members and referees for their work which contributed 
to the quality of the meeting. We have tried to make the list of referees as 
complete and accurate as possible and apologize for any omissions and errors. 
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Foreword 



Special thanks go to Jochen Bern who provided a reliable software system 
used for electronic submissions and reviewing (tested before on STAGS 1999 
through 2001), and volunteered a fair amount of his night-time to maintain 
and further improve the system according to our unrealistic specifications and 
expectations. 

Finally, we would like to thank Milena Zeithamlova, Lucie Vachova, and 
Andrea Kutnarova from Action M Agency for their excellent work on local ar- 
rangements. 

We wish MFCS 2002, to be held in Warsaw, success and many excellent 
contributions. 
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A New Category for Semantics 



Dana S. Scott 

Carnegie Mellon University, 
Pittsburgh, PA, USA 



Abstract. Domain theory for denotational semantics is over thirty years 
old. There are many variations on the idea and many interesting con- 
structs that have been proposed by many people for realizing a wide 
variety of types as domains. Generally, the effort has been to create cat- 
egories of domains that are cartesian closed (that is, have products and 
function spaces interpreting typed lambda-calculus) and permit solutions 
to domain equations (that is, interpret recursive domain definitions and 
perhaps untyped lambda-calculus). 



What has been missing is a simple connection between domains and the 
usual set-theoretical structures of mathematics as well as a comprehensive logic 
to reason about domains and the functions to be defined upon them. In December 
of 1996, the author realized that the very old idea of partial equivalence relations 
on types could be applied to produce a large and rich category containing many 
specific categories of domains and allowing a suitable general logic. The category 
is called Equ, the category of equilogical spaces. 

The simplest definition is the category of To spaces and total equivalence 
relations with continuous maps that are equivariant (meaning, preserving the 
equivalence relations). An equivalent definition uses algebraic (or continuous) 
lattices and partial equivalence relations, together with continuous equivariant 
maps. This category is not only cartesian closed, but it is locally cartesian closed 
(that is, it has dependent sums and products). Moreover, it contains as a full 
subcategory all Tq spaces (and therefore the category of sets and the category 
of domains). 

The logic for this category is intuitionistic and can be explained by a form 
of the realizability interpretation, as will be outlined in the lecture. The project 
now is to use this idea as a unifying platform for semantics and reasoning. In 
the last four years the author has been cooperating with faculty and students 
at Carnegie Mellon on this program, namely Steven Awodey, Andrej Bauer, 
Lars Birkedal, and Jesse Hughes. A selection of papers (in reverse chronolog- 
ical order) follows. These (and futute papers) are available via the WWW at 
http: / / www.cs.cmu.edu/Groups/LTC. 
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On Implications between P-NP-Hypotheses: 
Decision versus Computation in Algebraic 

Complexity 



Peter Biirgisser 



Dept, of Mathematics and Computer Science, University of Paderborn, 
D-33095 Paderborn, Germany 
buergisserSupb . de 



Abstract. Several models of NP-completeness in an algebraic frame- 
work of computation have been proposed in the past, each of them hing- 
ing on a fundamental hypothesis of type Pt^NP. We first survey some 
known implications between such hypotheses and then describe attempts 
to establish further connections. This leads us to the problem of relat- 
ing the complexity of computational and decisional tasks and naturally 
raises the question about the connection of the complexity of a polyno- 
mial with those of its factors. After reviewing what is known with this 
respect, we discuss a new result involving a concept of approximative 
complexity. 



1 Introduction 

Algebraic complexity theory is the study of the intrinsic difficulty of compu- 
tational problems that can be posed in an algebraic or numerical framework. 
Instead of basing this study on the model of the Turing machine, it uses “alge- 
braic models” of computation. Besides the fact that these models form a natural 
theoretical framework for the algorithms commonly known to solve the problems 
under consideration, the main motivation for this choice of model is the idea that 
various methods from pure mathematics like algebraic geometry or topology 
can be employed to establish lower bounds in this more structured world. This 
project has been successful for problems of polynomially bounded complexity, 
as illustrated by the large body of work presented in the textbook [14]. 

The theory of NP-completeness is one of the main cornerstones of computa- 
tional complexity, although still resting on the unproven hypothesis that P ^ NP. 
In seminal works by Valiant [45,47] and Blum, Shub and Smale [9] models of 
NP-completeness in an algebraic framework of computation have been proposed, 
motivated by the hope to prove the elusive separation of P and NP in these 
frameworks. So far, this hope has not been fulfilled except in rather trivial cases. 
However, there have been successful attempts in relating these different mod- 
els to each other and in establishing implications between the various P-NP- 
hypotheses presently studied. The known “transfer theorems” either relate such 
hypotheses of the same type refering to different fields, or provide implications 
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from the separation of P and NP in the classical bit model to a corresponding 
separation in an algebraic model of computation. As an exception to this rule, 
Fournier and Koiran [20] (see also [31]) recently proved an implication in the 
reverse direction for a restricted algebraic model. This has challenged the hope 
that a P-NP-separation in the algebraic model might be easier to prove than in 
the bit model. We review some of the known transfer results in Section 3. 

In Section 4 we will discuss an attempt [12] to relate the P-NP hypothesis 
in Valiant’s framework to the one in the Blum, Shub, Smale (BSS) framework, 
as well as an attempt [42] to establish a connection to the complexity of certain 
univariate polynomials. This leads us to the problem of relating the complexity 
of computational and decisional tasks and naturally raises the question about 
the connection of the complexity of a polynomial with those of its factors. This 
relationship is not well understood and turns out to be the bottleneck in both 
attempts. 

In Section 5 we review what is known about the complexity of factors. We 
mention a result by Kaltofen [27], which seems to be widely unknown in the 
community (and has been independently discovered by the author). It states that 
the complexity of an irreducible factor 5 of a polynomial / can be polynomially 
bounded in the complexity of /, the degree of g, and the multiplicity of g. We 
believe that the dependence on the multiplicity is not necessary, but we have 
been unable to prove this. Instead, we present a new result [10], which states that 
the dependence on the multiplicity can be avoided when replacing complexity 
by the related notion of “approximative complexity”, which is introduced in 
Section 6. 

As a major application, we obtain the following relative hardness result about 
decision complexity over the reals: Checking the values of polynomials forming 
complete families in Valiant’s sense cannot be done with a polynomial number 
of arithmetic operations and tests, unless the permanent has a p-boimded ap- 
proximative complexity, which seems unlikely. This hardness result extends to 
randomized algorithms with two-sided error, formalized by randomized algebraic 
computation trees. 



2 Algebraic Models of NP-Completeness 

2.1 Blum-Shub-Smale Model 

Blum, Shub, and Smale [9] have extended the classical theory of NP-completeness 
to a theory of computation over arbitrary rings, the three most interesting cases 
being the field of real or complex numbers, and the finite field F2. In the latter 
case, the classical theory of computation is recovered. As usual in algebraic 
complexity theory, a basic computational step is an arithmetic operation, an 
equality test, or a <-test if the field is ordered, and we make the idealizing 
assumption that this can be done with infinite precision. Moreover a uniformity 
condition is assumed to be satisfied. For a recent account see [8]. Poizat [40] 
describes an elegant approach to a P-NP-framework over general structures. 
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Many ideas of discrete structural complexity can be extended to such a 
framework; in particular the complexity classes P, NP and the notion of NP- 
completeness. In this framework, a natural NP-complete problem turns out to 
be the feasibility problem to decide for a given system of polynomials whether 
they have a common root. In all three settings (F 2 ,M, C) it is a fundamental 
open problem whether P yf NP is true. It has been shown [7] that this question 
has the same answer over all algebraically closed fields of characteristic zero. 
Over the reals, no corresponding transfer theorem is known. 

2.2 Valiant’s Algebraic Model 

In [45,47] Valiant proposed an analogue of the theory of #P-completeness in a 
framework of algebraic complexity, in connection with his famous hardness re- 
sult for the permanent [46] . This theory features algebraic complexity classes VP 
and VNP as well as VNP-completeness results for many families of generating 
functions of graph properties, the most prominent being the family of perma- 
nents. For a comprehensive presentation of this theory, we refer to [21,14] and 
to the recent account [12]. 

While the complexity classes in the BSS-modell capture decision problems, 
the Valiant classes deal with the computational problem to evaluate multivariate 
polynomials. Only straight-line computations are considered and uniformity is 
not taken into account. The basic object studied is a p-family over a fixed field fc, 
which is a sequence (/„) of multivariate polynomials such that the number of 
variables as well as the degree of /„ are polynomially bounded (p-bounded) 
functions of n. The complexity class VP over k consists of the p-families (/„) 
which are p-computable, which means that the straight-line complexity of /„ 
is p-boimded in n. Note that although can be computed with only n mul- 
tiplications, the corresponding sequence is not considered to be p-computable, 
as the degrees grow exponentially. A p-family (/„) is called p-definable iff there 
exists (pn) G VP such that for all n 



The set of p-definable families form the complexity class VNP. The class VP is 
obviously contained in VNP, and Valiant’s hypothesis claims that this inclusion 
is strict. It has been shown in [12, § 4.1] that over algebraically closed fields, this 
hypothesis depends at most on the characteristic of the field. 

3 Known Implications between P-NP-Hypotheses 

In each of the algebraic models discussed before we have raised the fundamental 
question whether P yf NP. Recently, a considerable amount of research has 
been directed towards establishing “transfer theorems” that provide implications 
from the separation of P and NP in the classical bit model to a corresponding 
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separation in an algebraic model of computation. These results rely on various 
techniques to eliminate the real or complex constants that may be used by a 
computation in the algebraic model on a Boolean input. One of the first results 
of this kind was established by Koiran [29] for the additive BSS-model over the 
reals, which allows additions and subtractions as the only arithmetic operations. 
Koiran showed that 

P ^ NP nonuniformly => P yf NP over M as an ordered group. (1) 

Here, the elimination of constants is based on polyhedral geometry, in par- 
ticular on the fact that a nonempty polyhedron defined by a system of in- 
equalities with small coefficients has a small rational point. Quite astonishingly, 
the converse of the above implication is also true, as recently established by 
Fournier and Koiran [20]. The proof of the converse of (1) relies on Meyer auf 
der Heide’s [37,39] construction of small depth linear decision trees for locating 
points in arrangements of hyperplanes. 

For the unrestricted BSS-model over the reals (with order) no implication 
in either direction is known. Over the complex numbers, we know the following 
about the unrestricted BSS-model 

P yf NP nonuniformly P yf NP over C, (2) 

as noticed independently by several researchers. The essential point here is that 
using modular arithmetic, it is possible to test in random polynomial time 
whether an integer given by a straight-line program equals zero. Actually, the 
left-hand side can be replaced by the weaker statement NP % BPP. (A proof can 
be found in Cucker et al. [18], although somewhat hidden.) We do not know how 
to efficiently test for positivity of an integer given by a straight-line program, 
and this seems to be the main obstacle for establishing an implication analogous 
to (2) over the reals. 

In [13] we have shown the following implication for Valiant’s model over F 2 : 

NC^ yf 0P nonuniformly VP yf VNP over F 2 . (3) 

More specifically, by interpreting families of polynomials over F 2 as Boolean 
functions, we can assign to an algebraic class C its Boolean part BP(C). It turns 
out that over F 2 we have 

NCVpoly C BP(VP) C NCVpoly, BP(VNP) = 0P/poly, 

which immediately implies (3). Note that the class VNP over F 2 is by definition 
closely related to 0P. On the other hand, the Boolean function corresponding 
to a p-computable family lies in NC^/poly, due to the fact [6] that an algebraic 
straight-line program of size computing a polynomial of degree can 

be efficiently parallelized, i.e., simulated by a straight-line program of size 
and depth 0(log^ n). A challenging open problem is to find out to what extent 
implication (3) might be reversed: does BP(VP) = BP(VNP) imply that VP = 
VNP over F 2 ? As a first step in this direction, it seems rewarding to locate 
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BP(VP) exactly in the hierarchy of known complexity classes between NC^/poly 
and NC^/poly. 

For Valiant’s model in characteristic zero, we have proved in [13] the following 
implication similar to (3): 

FNC^ yf #P nonuniformly VP yf VNP in characteristic zero, 

conditional on the generalized Riemann hypothesis. Besides the ideas mentioned 
for the field F 2 , the proof is based on some elimination of constants, which is 
achieved by a general result about the frequency of primes p with the property 
that a system of integer polynomial equations solvable over C has a solution 
modulo p. 

In Section 4.2 we will address the question whether VP yf VNP implies 
P yf NP over C. The difficulty here is not to eliminate constants, but to link the 
complexity of decisional problems to the complexity of computational problems. 



4 Attempts to Establish Further Connections 

4.1 Linking Decisional to Computational Complexity 

Are there functions, whose value can be checked in polynomial time but which 
cannot be computed in polynomial time? In fact, this is a basic assumption in 
cryptography, since it turns out to be equivalent to the existence of one-way 
functions [24,41]. We ask now this question for a real or complex polynomial 
function g. More specifically, we ask whether deciding g{x) = 0 can be done 
considerably faster than just by computing g at input x? 

In order to make this formal, we introduce the computational complexity L(g) 
and the decisional complexity C{g). The first one refers to the straight-line model 
of computation and counts all arithmetic operations (divisions are not allowed 
for simplicity). The decisional complexity refers to algebraic computation trees 
and counts besides arithmetic operations also branchings according to equality 
tests (and <-tests over the reals). Clearly, C{g) < L{g) + 1. In both cases, we 
allow that any real or complex numbers may be used as constants. 

Assume that g is the product of m real linear polynomials: g = h\ - ■ ■ h^. 
In the case n = I we have obviously C{g) = O(logm) using binary search. The 
result by Meyer auf der Heide [37,39] and Meiser [36] extends this to higher 
dimensions and states that it is possible to locate a given point x G M" in the 
hyperplane arrangement given by hi, , hm by an algebraic computation tree 
with depth (nlogm)*^*^^). (In fact, linear decision trees are sufficient for this, 
see [19].) We therefore have C{g) = (nlogm)'^^^). On the other hand, one can 
show that the complexity for evaluating g equals &{mn) if the hi are in general 
position [14, Chap. 5]. This example shows that computational and decisional 
complexity may differ dramatically. In Section 7.1 we will provide strong evidence 
that this phenomenon does neither occur over the complex numbers nor over the 
reals if g is irreducible and has a degree polynomially bounded in its complexity. 
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The following well-known lemma [16,8] provides a link from decisional to 
computational complexity. It naturally leads to the question of relating the 
complexity of a polynomial to those of its multiples, which will be investigated 
systematically in Section 5. 

Lemma 1. There exists a nonzero multiple f of g such that L(f) < C{g). 
Over C this is true without additional assumption, over M we have to require 
that g is irreducible. 



4.2 Does Valiant’s Hypothesis Imply the BSS Hypothesis over C? 

In [12, §8.4] we have conjectured that Valiant’s hypothesis implies the BSS 
hypothesis over C. Loosely speaking, this means that if the permanent is in- 
tractable, then solving systems of polynomial equations over C is intractable 
as well. The following reasonings from [12], similar as in Heintz and Morgen- 
stern [25], have lead us to this conjecture. 

The real, weighted cycle cover problem is the following decision problem: 
given a real n x n matrix [wij] of weights and a real number s, one has to decide 
whether there exists some permutation tt G Sn such that X^r=i (Note 

that a permutation can be visualized as a cycle cover.) By the above mentioned 
result [37,39], this decision problem can be solved by algebraic computation trees 
over the reals with depth We reformulate this problem now as follows: 

let Xij = 2“'*'^ and y = 2“®. Then the above question amounts to test whether 
the (reducible) polynomial 

9n • n7re5’,i(^ l^^l,7r(l) ‘ ‘ ’ ^n,7r(n)) 

vanishes for a given (positive) real matrix x = [xij\ and y > 0. 

If we expand according to powers of V, we see that the permanent of 
the matrix [X^ j] is the coefficient of V: in fact, = 1 — VPER„(X) -|- 0{Y^)- 
A variant of a well-known result on the computation of homogeneous parts im- 
plies L(PER„) < 4L(y„) (compare [14, § 7.1 ]). Therefore, is hard to compute 
if Valiant’s hypothesis is true. 

It is easy to see that the problem to check whether gn{x,y) = 0 gives rise 
to a problem in the BSS complexity class NP over the reals. Indeed, for a given 
matrix x and y S M it suffices to guess the permutation tt and to check that 

y^l,7r{l) ' ' ' ^n,7r(n) 

We present now an attempt to deduce P NP over C from Valiant’s hy- 
pothesis, based on the decision problem gn{x, y) = 0 over the complex numbers. 
Assume that P = NP over C. Then the above decision problem would lie in 
P and therefore could be solved by algebraic computation trees of depth poly- 
nomially bounded in n, hence C{gn) = By Lemma 1 there is a nonzero 

multiple fn of y„ for each n such that L{fn) < C(y„), hence L(/n) = . 

If we could derive from this that the factor y„ has a complexity polynomially 
bounded in n, then we could conclude L(PER„) = which contradicts 

Valiant’s hypothesis, as the the permanent family is VNP-complete. 
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4.3 Univariate Polynomials and P NP over C 

For given n G N consider the problem (“Twenty Questions”) to decide for a given 
complex number x whether x G {1,2,. ..,n}. The complexity of this problem in 
the model of computation trees over C thus equals C(p„), where 



p„(X) := {X-l){X-2)---{X-n) 



are the Pochhammer- Wilkinson polynomials. Shub and Smale [42] made the fol- 
lowing reasoning similar to the one in Section 4.2. By considering the parameter n 
also as an input, the above decision problem is easily seen to lie in the class NP 
over C, when we consider the pair (n, x) as an instance of size (. = [logn] . The 
point is that one can check whether x is an integer in the range 0, 1, . . . , 2^ — 1 
by guessing complex numbers wq, . . . ,we~i and checking that x = X^i=o 
and Wi(wi — 1) = 0 for all i. 

Assume that P = NP over C. Then the above decision problem would lie in P 
and we would have C(p„) = By Lemma 1 there would exist a nonzero 

multiple fn of Pn with complexity L(/n) < C{pn) = ■ This argument 

can be refined: by eliminating the finite set of complex constants used by a 
BSS-machine, it is possible to achieve that the multiple /„ of is an integer 
polynomial computed by a straight-line program of length using 1 as the 

only constant. We call the minimal length of a straight-line program satisfying 
this additional condition the r-complexity r(/). Clearly, L{f) < r{f). 

Shub and Smale [42] set up the so-called r-conjecture, which claims the 
following connection between the number z{f) of distinct integer roots of a 
univariate integer polynomial / and its r-complexity: 

z{f)<{l + r{f)r , 

where c > 0 is a universal constant. The r-conjecture thus implies P ^ NP over 
the field of complex numbers. 

In order to illustrate that the r-conjecture is of a number theoretic quality, let 
us ask a more general question. Let A: be a field, / be a polynomial in n variables 
over k, and d G N. We write Nd{f) for the number of irreducible factors of / 
over k having degree at most d, not counting multiplicity. For a fixed field k, we 
raise the following question: 

3c > 0 Vn, d V/ G fc[Ai, . . . , A„] : Nd(f) < (L(/) + d)F (4) 

It is clear that this statement over Q implies the r-conjecture: the difference being 
that we take n = d = 1, count all rational roots of /, and measure complexity 
with L instead of r. Refering to question (4), we observe the following: 

(i) If (4) is true over some field k, then it must be true over the rationale. 

(ii) Question (4) is false over finite fields, real or algebraically closed fields, and p- 
adic fields. 

(iii) Over number fields one may equivalently take n = 1 in (4). 
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The proof is simple: first note that property (4) is inherited by subfields of k. 
A counterexample over fc = is provided by the factorization of / = X'^ — X 
into the product of all monic irreducible polynomials over ¥q whose degree is 
a divisor of d. This example can be lifted to the p-adics. Over M and C one 
may use / = A" — 1 or the Chebychev polynomials. The proof of (iii) is an 
immediate consequence of the Hilbert irreducibility theorem [32]. (We note that 
a corresponding conclusion with r instead of L is not clear.) 

It is interesting to note that the Pochhammer-Wilkinson polynomial p„ can 
be evaluated with 0{^/nlog^ n) arithmetic operations. Indeed, assume n = 
and write for x G R 

Pm^{x) = rio|9<m {x-qm-r) = hm{q), 

where hm{Y) := Y\^Zq{x — mY — r) = Using FFT-based fast 

arithmetic (cf. [14, Chap. 2]) one can compute on input x all coefficients Oi(x) 
and then use multiple evaluation to obtain hm{q) for all g < m using only 
O(TOlog^m) arithmetic operations. A more detailed reasoning yields t{p^ 2 ) = 
0(mlog^ mloglogm). We remark that a similar idea was first formulated for 
the efficient computation of factorials [44] . 

We do not know whether the sequence of Pochhammer-Wilkinson polyno- 
mials (pn) is hard to compute in the sense that L{pn) > n'^ for some e > 0. 
However, we can make the following interesting observation: 

pn{x^) = • g„ := n;=i(^ - vj) ■ n;=i(^ + vj)- 

Using techniques of algebraic complexity theory, Heintz and Morgenstern [25] 
were able to prove that each of the sequences (g„) and (g„) is hard to compute 
(see also Baur [4]). 

If we were able to extend the hardness proof for g„ to all of its nonzero 
multiples, then we had proved that all nonzero multiples of Pn are hard and thus 
P yf NP over C. The problem to relate the complexity of a polynomial with its 
nonzero multiples appears here closely related to the P-NP problem over the 
complex numbers. 

We remark that Aldaz at al. [1] showed that hard to compute 

and Baur and Halupczok [5] proved a corresponding lower bound for all the 
nonzero multiples of these polynomials. 



5 Complexity of Factors 

We proceed now with a systematic investigation of the relationship in complexity 
of a polynomial / with those of its factors. The first question to ask is whether 
the complexity of a factor g can always be polynomially bounded in the com- 
plexity of /. Our developments in Section 4.3 already indicate that the answer 
to this question is negative, since a positive answer would provide a proof of 
P yf NP over C. The answer is indeed negative, as first discovered by Lipton 
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and Stockmeyer [35]. The simplest known example illustrating this is as follows: 
consider /„ = — 1 = nj< 2 ”(^ ~ C'^); where ( = exp(2m/2^). By repeated 

squaring we get T(/„) < n + 1. On the other hand, one can prove that for almost 
all M C {0, 1, . . . , 2” — 1} the random factor ~ C'^) ^ complexity 

which is exponential in n, cf. [14, Exercise 9.8]. A similar reasoning can be made 
over the rationals based on the factorization into the cyclotomic polynomials. 
This idea yields reducible factors of high complexity. 

Problem 1. Construct a sequence of irreducible polynomials gn with complexity 
exponential in n, which have a nonzero multiple /„ with complexity polynomially 
bounded in n. 

In the above example, the degree of the factor g is exponential in the complex- 
ity of /. We restrict now our attention to factors having a degree polynomially 
bounded in the complexity of /. A well-known result by Kaltofen [28] describes 
a randomized polynomial time algorithm for factoring a multivariate polyno- 
mial / given by a straight-line program. Hereby, the upper bound is polynomial 
in the straight-line complexity of / and the degree of /. In a widely unknown 
paper, Kaltofen [27] also proved that the complexity of any irreducible factor g 
is polynomially bounded in the complexity of /, the degree of g and the multi- 
plicity of g. This result, stated explicitly below, was independently found by the 
author, compare [12, Thm. 8.14]. Hereby, the notation M{d) stands for an upper 
bound on the complexity for the multiplication of two univariate polynomials of 
degree d over k, e.g. M{d) = O(dlogd) if the field k “supports” fast Fourier 
transforms. 

Theorem 1. Assume f = g^h with polynomials g,h £ k[Xi , . . . , A„] which are 
coprime. Letd > 1 6e the degree ofg and suppose that k is afield of characteristic 
zero. Then we have 

L{g) = 0{M{d^e){L{f) + dloge)). 

In [12, Conj. 8.3] we have conjectured that the dependence on the multiplic- 
ity e can be omitted, that is, we think that the complexity of the factor g is 
polynomially bounded in the complexity of / and the degree d of g. 

In Section 7.1 we will present a new result [10] stating that the dependence on 
the multiplicity in Theorem 1 can indeed be omitted when replacing complexity 
by the related notion of “approximative complexity”, to be defined next. 

The fact that a computation with a polynomial number of steps may produce 
intermediate results of exponential degree is well-known to cause considerable 
complications. Actually, the so-called weak BSS model of computation [30] was 
defined in order to cope with this phenomenon, simply by forbidding such an 
exponential growth of degree. The fact that the P-NP separation in the weak 
model is trivial to obtain clearly shows that this model is a large oversimplifi- 
cation. We remark that the Valiant model also excludes an exponential growth 
of degrees during a computation. However, one can show [43] that this is no 
restriction for the study of polynomials of p-bounded degree, like permanents. 
Note that resultants of systems of polynomial equations have a huge degree and 
are not captured by Valiant’s framework. 
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6 Approximative Complexity 

The concept of approximative complexity has been systematically studied in the 
framework of bilinear complexity (border rank) and there it has turned out to 
be one of the main keys to the currently best known fast matrix multiplication 
algorithms [17]. For computations of polynomials or rational functions, approx- 
imative complexity has been investigated in less detail. Griesser [22] generalized 
most of the known lower bounds for multiplicative complexity to approxima- 
tive complexity. Lickteig [33] as well as Grigoriev and Karpinski [23] employ the 
notion of approximative complexity for proving lower bounds. We refer to [14, 
Ghap. 15] and the references there for further information. 



6.1 Algebraic Definition and Topological Characterization 

Let / = fq{Xi,. . . ,Xn)Y'^ + /q+i(Ai, . . . , -I- . . . be the expansion 

of a polynomial / with respect to the variable Y. We do not know whether 
the complexity of the leading coefficient fq can be polynomially bounded in 
the the complexity of /. However, we can make the following observation. For 
the moment assume that k is the field of real or complex numbers. We have 
= fq{X) and L{f{X,y)) < L{f) for all y & k. Thus we can 
approximate fq with arbitrary precision by polynomials having complexity at 
most L{f). We could say that fq has “approximate complexity” at most L{f). 

We will formalize this in an algebraic way; a topological interpretation will 
be given later. In what follows, K := k(e) is a rational function field in the 
indeterminate e over the field k and R denotes the local subring of K consisting 
of the rational functions defined at e = 0. We write F’e=o for the image of 
F ^ R[X] under the morphism R[X] — > k[X] induced by e 0. 

Definition 1. Let f € fc[Ai, . . . , A„]. The approximative complexity L{f) of 
the polynomial f is the smallest natural number r such that there exists F in 
i?[Ai, . . . ,Xn] satisfying F^=o = / and L{F) < r. Here the complexity L is to 
be interpreted with respect to the larger field of constants K . 

Even though L refers to division-free straight-line programs, divisions will 
occur implicitly since our model allows the free use of any elements of K as 
constants. In fact, the point is that even though F is defined with respect to the 
morphism e i— > 0, the intermediate results of the computation may not be so. 
Note that L{f) < L{f). 

We remark that the assumption that any elements of K are free constants is 
made for achieving conceptual simplicity. We could as well require to build up 
the needed elements of K from e, e~^ and elements of k. One can show that this 
would not significantly change our main conclusions. 

The topological characterization of approximative complexity, to be pre- 
sented next, shows that this is a very natural notion from a mathematical point 
of view. Assume k to be an algebraically closed field. There is a natural way to 
put a Zariski topology on the polynomial ring A„ := k[Xi , . . . , A„] as a limit of 
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the Zariski topologies on the finite dimensional subspaces {/ € An | deg / < d} 
for c? G N. If fc is the field of complex numbers, we may define the Euclidean 
topology on An in a similar way. If / € An satisfies L{f) < r, then it easy to see 
that / lies in the closure (Zariski or Euclidean) of the set {/ G An \ L{ f) < r}. 
Alder [2] has shown that the converse is true and obtained the following topo- 
logical characterization of the approximative complexity. 

Theorem 2. The set {/ G | L{f) < r} is the closure of the set {/ G | 
L{f) < for the Zariski topology. If k = C, this is also true for the Euclidean 
topology. 

6.2 Computation of p-adic CoefRcients 

Let f,p G k[Xi, . . . and assume p to be monic of degree d > 1 in Y. 

Let / = be the p-adic representation of /, that is, fi G k[X,Y] and 

degy fi < d. Using the idea in [14, § 7.1] it is not hard to see that the complexity 
of the coefficient polynomial fi of U* can be polynomially bounded in d, i, 
and L{f). The following observation shows that the dependence on the degree i 
cannot be avoided in general. 

Proposition 1. The complexity of coefficient polynomials in a p-adic represen- 
tation of a polynomial f is not polynomially bounded in L{f) and d = degp, 
unless Valiant’s hypothesis is false. 

Consider the F-adic representation of the following polynomial of com- 
plexity L{fn) = 0{n^) 

fn := nr=i ( e;=i = e, fuAx)Y^ 

and observe that the coefficient fn^ 2 ^~i{X) equals the the permanent of the 
matrix [Xij]. This already provides the proof of Proposition 1. 

Assume now that the p-adic representation / = f^p^ -\- fi+ip^^^ -I- . . . starts at 
order £, f( ^ 0. We can consider f( as the leading coefficient of / with respect to 
the basis p. By contrast with Proposition 1, we can say the following about the 
approximative complexity of the leading coefficient in relation to the complexity 
of / (cf. [10]). 

Proposition 2. The approximative complexity of the leading coefficient fi is 
polynomially bounded in d and Lff): we have 

Uh) = o{M{d)Uf))- 

7 Decision versus Computation 

7.1 Approximative Complexity of Factors 

Here is the result from [10], which eliminates the dependence on the multiplicity 
in Theorem 1 by switching to approximative complexity. The number 2 < lu < 
2.38 denotes the exponent of matrix multiplication (cf. [14, Chap. 15]). 
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Theorem 3. Assume that g is an irreducible factor of degree d of a polynomial 
f £ R[Xi, . . . ,X„]. We assume that the zeroset of g is a hypersurface in R”. 
Then we have for any e > 0 that 

L{g) = o{M{d^)Uf) + 

A corresponding result holds over C. Moreover, we remark that in the special 
situation, where g is the generator of the graph of a polynomial function, we 
obtain considerably better bounds, valid over any infinite field of characteristic 
zero. 

The idea of the proof of Theorem 3 is as follows: After a suitable coordinate 
transformation, one can interpret the zeroset of the factor g locally around the 
origin as the graph of some analytic function Lp. In order to cope with a possibly 
large multiplicity of g, we apply a small perturbation to the polynomial / without 
affecting its complexity too much. This results in a small perturbation of Lp. We 
compute now the homogeneous parts of the perturbed by a Newton iteration 
up to a certain order. Using efficient polynomial arithmetic, this gives us an upper 
bound on the approximative complexity of the homogeneous parts of p up to a 
predefined order. In the special case, where the factor g is the generator of the 
graph of a function, we are already done. Otherwise, we view the factor g as the 
minimal polynomial of p in the variable Y := Xn over the field A:(Ai, . . . , Xn~i). 
We show that the Taylor approximations up to order uniquely determine the 
factor g and compute the bihomogeneous components of g with respect to the 
degrees in the A-variables and Y by fast linear algebra. 

7.2 Applications to Decision Complexity 

By combining Theorem 3 with Lemma 1 we immediately get the following result 
stating that the approximative complexity of a polynomial g can be bounded 
polynomially in the decision complexity and the degree of g. 

Corollary 1. Let g be the generator of an irreducible hypersurface in M” or 
in C", d = degg. Then we have for any e > 0 that 

L{g) = 0{M{d^)C{g) + d^^+^M{d)). 

It is quite natural to incorporate the concept of approximative complexity 
into Valiant’s algebraic framework of NP-completeness. 

Definition 2. An approximatively p-computable family is a p-family (fn) such 
that Lffn) is a p-bounded function of n. The complexity class VP comprises all 
such families over a fixed field k. 

It is obvious that VP C VP . If the polynomial / is a projection of a poly- 
nomial g, then we clearly have L{f) < L{g). Therefore, the complexity class 
VP is closed under p-projections. We remark that VP is also closed under the 
polynomial oracle reductions introduced in [11]. 

At the moment, we know very little about the relations between the com- 
plexity classes VP, VP, and VNP. 
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Problem 2. 1. Is the class VP strictly contained VP ? 

2. Is the class VP contained in VNP? 

Since the class VP is closed under p-projections, the following strengthening 
of Valiant’s hypothesis is equivalent to saying that VNP-complete families are 
not approximately p-computable. 

Conjecture 1. The class VNP is not contained in the class VP . 

This conjecture should be compared with the known work on polynomial 
time deterministic or randomized approximation algorithms for the permanent 
of non-negative matrices [34,3,26]. Based on the Markov chain approach, Jerrum, 
Sinclair and Vigoda [26] have recently established a fully-polynomial random- 
ized approximation scheme for computing the permanent of an arbitrary real 
matrix with non-negative entries. We note that this result does not contradict 
Conjecture 1 since the above mentioned algorithm works only for matrices with 
non-negative entries while approximative straight-line programs work on all real 
inputs. 

Under the hypothesis VNP ^ VP, we can can conclude that checking the val- 
ues of polynomials forming VNP-complete families is hard, even when we allow 
randomized algorithms with two-sided error, formalized by randomized algebraic 
computation trees. This follows for deterministic computations easily from Corol- 
lary 1. The extension to randomized trees is straight-forward using [38,15,18]. 

Corollary 2. Assume VNP % VP over M. Then for any -complete fam- 
ily (gn), checking the value y = gn{x) over the reals cannot he done by deter- 
ministic or randomized algebraic computation trees with a polynomial number of 
arithmetic operations and tests in n. 

By applying Corollary 1 to the permanent polynomial, we see that Conjec- 
ture 1 implies the following separation of complexity classes in the BSS-model 
of computation (cf. [8]). 

Corollary 3. If VNP ^ VP is true, then we have P yf PAR in the BSS-model 
over the reals. 
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Abstract. Combinatorial games lead to several interesting, clean prob- 
lems in algorithms and complexity theory, many of which remain open. 
The purpose of this paper is to provide an overview of the area to encour- 
age further research. In particular, we begin with general background in 
combinatorial game theory, which analyzes ideal play in perfect-informa- 
tion games. Then we survey results about the complexity of determining 
ideal play in these games, and the related problems of solving puzzles, in 
terms of both polynomial-time algorithms and computational intractabil- 
ity results. Our review of background and survey of algorithmic results 
are by no means complete, but should serve as a useful primer. 



1 Introduction 

Many classic games are known to be computationally intractable: one-player 
puzzles are often NP-complete (as in Minesweeper), and two-player games are 
often PSPACE-complete (as in Othello) or EXPTIME-complete (as in Checkers, 
Chess, and Go). Surprisingly, many seemingly simple puzzles and games are 
also hard. Other results are positive, proving that some games can be played 
optimally in polynomial time. In some cases, particularly with one-player puzzles, 
the computationally tractable games are still interesting for humans to play. 

After reviewing some of the basic concepts in combinatorial game theory 
in Section 2, Sections 3-5 survey several of these algorithmic and intractabil- 
ity results. We do not intend to give a complete survey, but rather to give an 
introduction to the area. Given the space restrictions, the sample of results men- 
tioned here reflect a personal bias: results about “well-known” games (in North 
America), some of the results I find interesting, and results in which I have 
been involved. For a more complete overview, please see the full version of this 
paper [12]. 

Combinatorial game theory is to be distinguished from other forms of game 
theory arising in the context of economics. Economic game theory has applica- 
tions in computer science as well, most notably in the context of auctions [11] 
and analyzing behavior on the Internet [33]. 
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2 Combinatorial Game Theory 

A combinatorial game typically involves two players, often called Left and Right, 
alternating play in well-defined moves. However, in the interesting case of a 
combinatorial puzzle, there is only one player, and for cellular automata such 
as Conway’s Game of Life, there are no players. In all cases, no randomness or 
hidden information is permitted: all players know all information about gameplay 
{perfect information) . The problem is thus purely strategic: how to best play the 
game against an ideal opponent. 

It is useful to distinguish several types of two-player perfect-information 
games [3, pp. 16-17]. A common assumption is that the game terminates after 
a finite number of moves (the game is finite or short), and the result is a unique 
winner. Of course, there are exceptions: some games (such as Life and Chess) 
can be drawn out forever, and some games (such as tic-tac-toe and Chess) define 
ties in certain cases. However, in the combinatorial-game setting, it is useful to 
define the winner as the last player who is able to move; this is called normal 
play. If, on the other hand, the winner is the first player who cannot move, this 
is called misere play. (We will normally assume normal play.) A game is loopy 
if it is possible to return to previously seen positions (as in Chess, for example). 
Finally, a game is called impartial if the two players (Left and Right) are treated 
identically, that is, each player has the same moves available from the same game 
position; otherwise the game is called partizan. 

A particular two-player perfect-information game without ties or draws can 
have one of four outcomes as the result of ideal play: player Left wins, player 
Right wins, the first player to move wins (whether it is Left or Right), or the 
second player to move wins. One goal in analyzing two-player games is to deter- 
mine the outcome as one of these four categories, and to find a strategy for the 
winning player to win. Another goal is to compute a deeper structure to games 
described in the remainder of this section, called the value of the game. 

A beautiful mathematical theory has been developed for analyzing two-player 
combinatorial games. The most comprehensive reference is the book Winning 
Ways by Berlekamp, Conway, and Guy [3], but a more mathematical presen- 
tation is the book On Numbers and Games by Conway [8]. See also [21] for 
a bibliography. The basic idea behind the theory is simple: a two-player game 
can be described by a rooted tree, each node having zero or more left branches 
correspond to options for player Left to move and zero or more right branches 
corresponding to options for player Right to move; leaves corresponding to fin- 
ished games, the winner being determined by either normal or misere play. The 
interesting parts of combinatorial game theory are the several methods for ma- 
nipulating and analyzing such games/trees. We give a brief summary of some of 
these methods in this section. 

2.1 Conway’s Surreal Numbers 

A richly structured special class of two-player games are John H. Conway’s 
surreal numbers [8], a vast generalization of the real and ordinal number systems. 
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Basically, a surreal number {L \ R} is the “simplest” number larger than all Left 
options (in L) and smaller than all Right options (in i?); for this to constitute a 
number, all Left and Right options must be numbers, defining a total order, and 
each Left option must be less than each Right option. See [8] for more formal 
definitions. 

For example, the simplest number without any larger-than or smaller-than 
constraints, denoted { | }, is 0; the simplest number larger than 0 and without 
smaller-than constraints, denoted {0 | }, is 1. This method can be used to gener- 
ate all natural numbers and indeed all ordinals. On the other hand, the simplest 
number less than 0, denoted { | 0}, is —1; similarly, all negative integers can be 
generated. Another example is the simplest number larger than 0 and smaller 
than 1, denoted {0 | 1}, which is similarly, all dyadic rationals can be gen- 
erated. After a countably infinite number of such construction steps, all real 
numbers can be generated; after many more steps, the surreals are all numbers 
that can be generated in this way. 

What is interesting about the surreals from the perspective of combinatorial 
game theory is that they are a subclass of all two-player perfect-information 
games, and some of the surreal structure, such as addition and subtraction, 
carries over to general games. Furthermore, while games are not totally ordered, 
they can still be compared to some surreal numbers and, amazingly, how a game 
compares to the surreal number 0 determines exactly the outcome of the game. 
This connection is detailed in the next few paragraphs. 

First we define some algebraic structure of games that carries over from 
surreal numbers. Two-player combinatorial games, or trees, can simply be rep- 
resented as {L I i?} where, in contrast to surreal numbers, no constraints are 
placed on L and R. The negation of a game is the result of reversing the roles of 
the players Left and Right throughout the game. The (disjunctive) sum of two 
(sub)games is the game in which, at each player’s turn, the player has a binary 
choice of which subgame to play, and makes a move in precisely that subgame. 
A partial order is defined on games recursively: a game x is less than or equal to 
a game y if every Left option of x is less than y and every Right option of y is 
more than x. 

Note that while { — 1 1 1} = 0 = { | } in terms of numbers, {—1 1 1} and { | } 
denote different games (lasting 1 move and 0 moves, respectively), and in this 
sense are equal in value but not identical symbolically or game-theoretically. 
Nonetheless, the games {—1 1 1} and { | } have the same outcome: the second 
player to move wins. 

Amazingly, this holds in general: two equal numbers represent games with 
equal outcome (under ideal play). In particular, all games equal to 0 have the 
outcome that the second player to move wins. Furthermore, all games equal 
to a positive number have the outcome that the Left player wins; more gener- 
ally, all positive games (games larger than 0) have this outcome. Symmetrically, 
all negative games have the outcome that the Right player wins (this follows 
automatically by the negation operation). 
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There is one outcome not captured by the characterization into zero, positive, 
and negative games: the first player to move wins. An example of such a game 
is {1 1 0}; this fails to be a surreal number because 1 > 0. By the claim above, 
{1 1 0} II 0. Indeed, {1 1 0} || x for all surreal numbers x, 0 < x < 1. In contrast, 
X < {I I 0} for all X < 0 and {1 | 0} < x for all 1 < x. In general it holds that a 
game is fuzzy with some surreal numbers in an interval [— n, n] but comparable 
with all surreals outside that interval. 

For brevity we omit many other useful notions in combinatorial game theory, 
such as additional definitions of summation, super-infinitesimal games * and t, 
mass, temperature, thermographs, the simplest form of a game, remoteness, and 
suspense; see [3,8]. 

2.2 Sprague- Grundy Theory 

A celebrated result in combinatorial game theory is the characterization of im- 
partial two-player perfect-information games, discovered independently in the 
1930’s by Sprague [39] and Grundy [25]. Recall that a game is impartial if it 
does not distinguish between the players Left and Right. The Sprague-Grundy 
theory [39,25,8,3] states that every finite impartial game is equivalent to an in- 
stance of the game of Nim, characterized by a single natural number n. This 
theory has since been generalized to all impartial games by generalizing Nim to 
all ordinals n; see [8,38]. 

Nim [5] is a game played with several heaps, each with a certain number of 
tokens. A Nim game with a single heap of size n is denoted by *n and is called a 
nimber. During each move a player can pick any pile and reduce it to any smaller 
nonnegative integer size. The game ends when all piles have size 0. Thus, a single 
pile *n can be reduced to any of the smaller piles *0, *1, . . . , *{n — 1). Multiple 
piles in a game of Nim are independent, and hence any game of Nim is a sum of 
single-pile games *n for various values of n. In fact, a game of Nim with k piles 
of sizes ni, ri 2 , . . . , nfc is equivalent to a one-pile Nim game *n, where n is the 
binary XOR of rii, ri 2 , . . . , n^. As a consequence, Nim can be played optimally 
in polynomial time (polynomial in the encoding size of the pile sizes). 

Even more surprising is that every impartial two-player perfect-information 
game has the same value as a single-pile Nim game, *n for some n. The number n 
is called variously the G-value, Grundy-value, or Sprague-Grundy function of 
the game. It is easy to define: suppose that game x has k options y\, . . . ,yk for 
the first move (independent of which player goes first). By induction, we can 
compute yi = *ni, ..., y^ = *n^. The theorem is that x equals *n where n 
is the smallest natural number not in the set {n\, . . . ,nk\- This number n is 
called the minimum excluded value or mex of the set. This description has also 
assumed that the game is finite, but this is easy to generalize [8,38]. 

The Sprague-Grundy function can increase by at most 1 at each level of the 
game tree, and hence the resulting nimber is linear in the maximum number 
of moves that can be made in the game; the encoding size of the nimber is 
only logarithmic in this count. Unfortunately, computing the Sprague-Grundy 
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function for a general game by the obvious method uses time linear in the number 
of possible states, which can be exponential in the nimber itself. 

Nonetheless, the Sprague-Grundy theory is extremely helpful for analyzing 
impartial two-player games, and for many games there is an efficient algorithm to 
determine the nimber. Examples include Nim itself, Kayles, and various general- 
izations [27]; and Cutcake and Maundy Cake [3, pp. 26-29]. In all of these exam- 
ples, the Sprague-Grundy function has a succinct characterization (if somewhat 
difficult to prove); it can also be easily computed using dynamic programming. 

2.3 Strategy Stealing 

Another useful technique in combinatorial game theory for proving that a par- 
ticular player must win is strategy stealing. The basic idea is to assume that 
one player has a winning strategy, and prove that in fact the other player has 
a winning strategy based on that strategy. This contradiction proves that the 
second player must in fact have a winning strategy. An example of such an argu- 
ment is given in Section 3.1. Unfortunately, such a proof by contradiction gives 
no indication of what the winning strategy actually is, only that it exists. In 
many situations, such as the one in Section 3.1, the winner is known but no 
polynomial-time winning strategy is known. 

2.4 Puzzles 

There is little theory for analyzing combinatorial puzzles (one-player games) 
along the lines of two-player theory summarized in this section. We present one 
such viewpoint here. In most puzzles, solutions subdivide into a sequence of 
moves. Thus, a puzzle can be viewed a tree, similar to a two-player game except 
that edges are not distinguished between Left and Right. The goal is to reach 
a position from which there are no valid moves (normal play). Loopy puzzles 
are common; to be more explicit, repeated subtrees can be converted into self- 
references to form a directed graph. 

A consequence of the above view is that a puzzle is basically an impartial two- 
player game except that we are not interested in the outcome from two players 
alternating in moves. Rather, questions of interest in the context of puzzles are 
(a) whether a given puzzle is solvable, and (b) finding the solution with the fewest 
moves. An important open direction of research is to develop a general theory 
for resolving such questions, similar to the two-player theory. For example, using 
the analogy between impartial two-player games described above, the notion of 
sums of puzzles makes sense, although it is not clear that it plays a similarly key 
role as with games. 

3 Algorithms for Two-Player Games 

Many nonloopy two-player games are PSPAGE-complete. This is fairly natu- 
ral because games are closely related to boolean expressions with alternating 
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quantifiers (for which deciding satisfiability is PSPACE-complete): there exists 
a move for Left such that, for all moves for Right, there exists another move for 
Left, etc. A PSPACE-completeness result has two consequences. First, being in 
PSPACE means that the game can be played optimally, and typically all posi- 
tions can be enumerated, using possibly exponential time but only polynomial 
space. Thus such games lend themselves to a somewhat reasonable exhaustive 
search for small enough sizes. Second, the games cannot be solved in polynomial 
time unless P = PSPACE, which is even “less likely” than P equaling NP. 

On the other hand, loopy two-players games are often EXPTIME-complete. 
Such a result is one of the few types of true lower bounds in complexity theory, 
implying that all algorithms require exponential time in the worst case. 

In this section we briefly survey some of these complexity results and related 
positive results, ordered roughly chronologically by the first result on a particular 
game. See also [18] for a related survey. Because of space constraints we omit 
discussion of games on graphs, as well as the board games Gobang, Shogi, and 
Othello. For details on these and other games, please refer to the full version of 
this paper [12]. 



3.1 Hex 

Hex [3, pp. 679-680] is a game designed by Piet Hein and 
played on a diamond-shaped hexagonal board; see Fig. 1. 

Players take turns filling in empty hexagons with their 
color. The goal of a player is to connect the opposite sides 
of their color with hexagons of their color. (In the figure, 
one player is solid and the other player is dotted.) A game 
of Hex can never tie, because if all hexagons are colored Fig. 1. A 5 x 5 Hex 
arbitrarily, there is precisely one connecting path of an board 
appropriate color between opposite sides of the board. 

Nash [3, p. 680] proved that the first player to move can win by using a 
strategy stealing argument (see Section 2.3). In contrast, Reisch [35] proved that 
determining the outcome of a general position in Hex is PSPACE-complete. 




3.2 Checkers (Draughts) 

The standard 8x8 game of Checkers (Draughts), like many 
classic games, is finite and hence can be played optimally in 
constant time (in theory). The complexity of playing in a 
general n x n board from a natural starting position, such as 
the one in Fig. 2, is open. However, deciding the outcome of an 
arbitrary configuration is PSPACE-hard [22]. If a polynomial 
bound is placed on the number of moves that are allowed in 
between jumps (which is a reasonable generalization of the 
drawing rule in standard Checkers [22]), then the problem is 
in PSPACE and hence is PSPACE-complete. Without such a 
restriction, however. Checkers is EXPTIME-complete [37]. 





o 




o 




o 




o 




o 


o 




o 




o 




o 




o 






o 




o 




o 




o 




o 


o 




o 




o 




o 




o 














































• 




• 




• 




• 




• 


• 




• 




• 




• 




• 






• 




• 




• 




• 




• 


• 




• 




• 




• 




• 





Fig. 2. A nat- 
ural starting 
configuration for 
lOx 10 Checkers, 
from [22] 
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On the other hand, certain simple questions about Checkers can be answered 
in polynomial time [22,13]. Can one player remove all the other player’s pieces in 
one move (by several jumps)? Can one player king a piece in one move? Because 
of the notion of parity on n x n boards, these questions reduce to checking 
the existence of an Eulerian path or general path, respectively, in a particular 
directed graph; see [22,13]. However, for boards defined by general graphs, at 
least the first question becomes NP-complete [22] . 



3.3 Go 




Presented at the same conference as the Checkers re- 
sult in the previous section (FOCS’78), Lichtenstein 
and Sipser [32] proved that the classic oriental game 
of Go is also PSPACE-hard for an arbitrary configu- 
ration on an n X n board. This proof does not involve 
any situations called ko’s, where a rule must be in- 
voked to avoid infinite play. In contrast, Robson [36] 
proved that Go is EXPTIME-complete when ko’s are 

involved, and indeed used judiciously. The type of ko used in this reduction is 
shown in Fig. 3. When one of the players makes a move shown in the figure, the 
ko rule prevents (in particular) the other move shown in the figure to be made 
immediately afterwards. 

Recently, Wolfe [41] has shown that even Go endgames are PSPACE-hard. 
More precisely, a Go endgame is when the game has reduced to a sum of Go 
subgames, each equal to a polynomial-size game tree. This proof is based on 
several connections between Go and combinatorial game theory detailed in a 
book by Berlekamp and Wolfe [2] . 



3.4 Chess 

Fraenkel and Lichtenstein [23] proved that a generalization of the classic game 
Chess to n X n boards is EXPTIME-complete. Specifically, their generalization 
has a unique king of each color, and for each color the numbers of pawns, bish- 
ops, rooks, and queens increase as some fractional power of n. (Knights are not 
needed.) The initial configuration is unspecified; what is EXPTIME-hard is to 
determine the winner (who can checkmate) from an arbitrary specified configu- 
ration. 



3.5 Hackenbush 

Hackenbush is one of the standard examples of a combinatorial game in Winning 
Ways] see e.g. [3, pp. 4-9]. A position is given by a graph with each edge colored 
either red (Left), blue (Right), or green (neutral), and with certain vertices 
marked as rooted. Players take turns removing an edge of an appropriate color 
(either neutral or their own color), which also causes all edges not connected 
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to a rooted vertex to be removed. The winner is determined by normal play. 
Chapter 7 of Winning Ways [3, pp. 183-220] proves that determining the value 
of a red-blue Hackenbush position is NP-hard. 

3.6 Domineering (Crosscram) 

Domineering or crosscram [3, pp. 117-124] is a partizan game involving place- 
ment of horizontal and vertical dominoes in a grid; a typical starting position 
is an m X n rectangle. Left can play only vertical dominoes and Right can play 
only horizontal dominoes, and dominoes must remain disjoint. The winner is 
determined by normal play. 

The complexity of Domineering, computing either the outcome or the value 
of a position, remains open. Lachmann, Moore, and Rapaport [31] have shown 
that the winner and a winning strategy can be computed in polynomial time 
for m G {1,2,3,4,5,7,9,11} and all n. These algorithms do not compute the 
value of the game, nor the optimal strategy, only a winning strategy. We omit 
discussion of the related game Cram [3, pp. 468-472]. 

3.7 Dots-and-Boxes and Strings-and- Coins 

Dots -and- Boxes [1], [3, pp. 507-550] is a well-known children’s 
game in which players take turns drawing horizontal and vertical 
edges connecting pairs of dots in an to x n subset of the lattice. 

Whenever a player makes a move that encloses a unit square 
with drawn edges, the player is awarded a point and must then 
draw another edge in the same move. The winner is the player 
with the most points when the entire grid has been drawn. See 
Fig. 4 for an example of a position. 

A generalization arising from the dual of Dots-and-Boxes is 
Strings- and- Coins. This game involves a sort of graph whose ver- 
tices are coins and whose edges are strings. The coins may be 
tied to each other and to the “ground” by strings; the latter connection can 
be modeled as a loop in the graph. Players alternate cutting strings (remov- 
ing edges), and if a coin is thereby freed, that player collects the coin and cuts 
another string in the same move. The player to collect the most coins wins. 

Winning Ways [3, pp. 543-544] describes a proof that Strings-and-Coins 
endgames are NP-hard. Eppstein [18] observes that this reduction should also 
apply to endgame instances of Dots-and-Boxes. 

3.8 Amazons 

Amazons is a game invented by Walter Zamkauskas in 1988, containing elements 
of Chess and Go. Gameplay takes place on a 10 x 10 board with four amazons of 
each color arranged as in Fig. 5 (left). In each turn. Left [Right] moves a black 
[white] amazon to any unoccupied square accessible by a Chess queen’s move. 




Fig. 4. 

A Dots- 

and-Boxes 

endgame 
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and fires an arrow to any unoccupied square reachable by a Chess queen’s move 
from the amazon’s new position. The arrow (drawn as a circle) now occupies its 
square; amazons and shots can no longer pass over or land on this square. The 
winner is determined by normal play. 

Gameplay in Amazons typically split into 
a sum of simpler games because arrows par- 
tition the board into multiple components. 

In particular, the endgame begins when each 
component of the game contains amazons of 
only a single color. Then the goal of each 
player is simply to maximize the number of 
moves in each component. Buro [7] proved 
that maximizing the number of moves in a 
single component is NP-complete (for n x n 
boards). In a general endgame, deciding the 
outcome may not be in NP because it is diffi- 
cult to prove that the opponent has no better 
strategy. However, Buro [7] proved that this 
problem is NP-equivalent [24], i.e., the problem can be solved by a polynomial 
number of calls to an algorithm for any NP-complete problem, and vice versa. 

It remains open whether deciding the outcome of a general Amazons position 
is PSPACE-hard. The problem is in PSPACE because the number of moves in 
a game is at most the number of squares in the board. 
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Fig. 5. The initial position in 
Amazons (left) and black trap- 
ping a white amazon (right) 



3.9 Phutball 













Fig. 6. A single move in Phutball consisting of four jumps 



Conway’s game of 
Philosopher’s Foot- 
ball or Phutball [3, 
pp. 688-691] in- 
volves white and 
black stones on a 
rectangular grid such 

as a Go board. Initially, the unique black stone (the ball) is placed in the middle 
of the board, and there are no white stones. Players take turns either placing 
a white stone in any unoccupied position, or moving the ball by a sequence of 
jumps over consecutive sequences of white stones each arranged horizontally, 
vertically, or diagonally. See Fig. 6. A jump causes immediate removal of the 
white stones jumped over, so those stones cannot be used for a future jump in 
the same move. Left and Right have opposite sides of the grid marked as their 
goal lines. Left’s goal is to end a move with the ball on or beyond Right’s goal 
line, and symmetrically for Right. 

Phutball is inherently loopy and it is not clear that either player has a winning 
strategy: the game may always be drawn out indefinitely. One counterintuitive 
aspect of the game is that white stones placed by one player may be “corrupted” 
for better use by the other player. Recently, however, Demaine, Demaine, and 
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Eppstein [13] found an aspect of Phutball that could be analyzed. Specifically, 
they proved that determining whether the current player can win in a single move 
(“mate in 1” in Chess) is NP-complete. This result leaves open the complexity 
of determining the outcome of a given game position. 

4 Algorithms for Puzzles 

Many puzzles (one-player games) have short solutions and are NP-complete. 
However, several puzzles based on motion-planning problems are harder, al- 
though often being in a bounded region, only PSPACE-complete. However, when 
generalized to the entire plane and unboundedly many pieces, puzzles often be- 
come undecidable. 

This section briefly surveys some of these results, following the structure of 
the previous section. Again, because of space constraints, we omit discussion 
of several puzzles: Instant Insanity, Cryptarithms, Peg Solitaire, and Shanghai. 
For details on these and other puzzles, please refer to the full version of this 
paper [12]. 



4.1 Sliding Blocks 

The Fifteen Puzzle [3, p. 756] is a classic puzzle consisting of 15 numbered square 
blocks in a 4 X 4 grid; one square in the grid is a hole which permits blocks to 
slide. The goal is to order the blocks as increasing in English reading order. 
See [29] for the history of this puzzle. 

A natural generalization of the Fifteen Puzzle is the — 1 puzzle on an n x n 
grid. It is easy to determine whether a configuration of the n^ — l puzzle can reach 
another: the two permutations of the block numbers (in reading order) simply 
need to match in parity, that is, whether the number of inversions (out-of-order 
pairs) is even or odd. However, to And a solution using the fewest slides is NP- 
complete [34]. It is also NP-hard to approximate within an additive constant, 
but there is a polynomial-time constant-factor approximation [34] . 

A harder sliding-block puzzle is Rush Hour, distributed by Binary Arts, Inc. 
Several 1 x 2, 1 x 3, 2 x 1, and 3x1 rectangles are arranged in an m x n 
grid. Horizontally oriented blocks can slide left and right, and vertically oriented 
blocks can slide up and down, provided the blocks remain disjoint. The goal is 
to remove a particular block from the puzzle via an opening in the bounding 
rectangle. Recently, Flake and Baum [20] proved that this formulation of Rush 
Hour is PSPACE-complete. 

A classic reference on a wide class of sliding-block puzzles is by Hordern [29] . 
One general form of these puzzles is that rectangular blocks are placed in a rect- 
angular box, and each block can be moved horizontally and vertically, provided 
the blocks remain disjoint. The goal is to re-arrange one configuration into an- 
other. To my knowledge, the complexity of deciding whether such puzzles are 
solvable remains open. A simple observation is that, as with Rush Hour, they 
are all in PSPACE. 
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4.2 Minesweeper 

Minesweeper is a well-known imperfect-information computer puzzle popularized 
by its inclusion in Microsoft Windows. Gameplay takes place on an n x n board, 
and the player does not know which squares contain mines. A move consists 
of uncovering a square; if that square contains a mine, the player loses, and 
otherwise the player is revealed the number of mines in the 8 adjacent squares. 
The player also knows the total number of mines. 

There are several problems of interest in Minesweeper. For example, given a 
configuration of partially uncovered squares (each marked with the number of 
adjacent mines), is there a position that can be safely uncovered? More generally, 
what is the probability that a given square contains a mine, assuming a uniform 
distribution of remaining mines? A different generalization of the first question 
is whether a given configuration is consistent, i.e., can be realized by a collection 
of mines. A consistency checker would allow testing whether a square can be 
guaranteed to be free of mines, thus answering the first question. A final problem 
is to decide whether a given configuration has a unique realization. 

Kaye [30] proved that testing consistency is NP-complete. This result leaves 
open the complexity of the other questions mentioned above. 

4.3 Pushing Blocks 

Similar in spirit to the sliding-block puzzles in Section 4.1 are pushing-block 
puzzles. In sliding-block puzzles, an exterior agent can move arbitrary blocks 
around, whereas pushing-block puzzles embed a robot that can only move adja- 
cent blocks but can also move itself within unoccupied space. The study of this 
type of puzzle was initiated by Wilfong [40] , who proved that deciding whether 
the robot can reach a desired target is NP-hard when the robot can push and 
pull L-shaped blocks. 

Since Wilfong’s work, research has concentrated on the simpler model in 
which the robot can only push blocks and the blocks are unit squares. Types of 
puzzles are further distinguished by how many blocks can be pushed at once, 
whether blocks can additionally be defined to be unpushable or fixed (tied to 
the board), how far blocks move when pushed, and the goal (usually for the 
robot to reach a particular location). Dhagat and O’Rourke [17] initiated the 
exploration of square-block puzzles by proving that Push-*, in which arbitrarily 
many blocks can be pushed at once, is NP-hard with fixed blocks. Bremner, 
O’Rourke, and Shermer [6] strengthened this result to PSPACE-completeness. 
Recently, Hoffmann [28] proved that Push-* is NP-hard even without fixed 
blocks, but it remains open whether it is in NP or PSPACE-complete. 

Several other results allow only a single block to be pushed at once. In this 
context, fixed blocks are less crucial because a 2 x 2 cluster of blocks can never 
be disturbed. A well-known computer puzzle in this context is Sokoban, where 
the goal is to place each block onto any one of the designated target squares. 
This puzzle was proved PSPACE-complete by Culberson [10]. A simpler puzzle, 
called PuSH-1, arises when the goal is simply for the robot to reach a particular 
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position. Demaine, Demaine, and O’Rourke [14] have proved that this puzzle is 
NP-hard, but it remains open whether it is in NP or PSPACE-complete. 

A variation on the Push series of puzzles, called PushPush, is when a block 
always slides as far as possible when pushed. The NP-hardness of these ver- 
sions follow from [14,28]. Another variation, called PuSH-X, disallows the robot 
from revisiting a square (the robot’s path cannot cross). This direction was sug- 
gested in [14] because it immediately places the puzzles in NP. Recently, Demaine 
and Hoffmann [16] proved that PuSH-lX and PushPush-1X are NP-complete. 
Hoffmann’s reduction for Push-* also establishes NP-completeness of PuSH-*X 
without fixed blocks. 



4.4 Clickomania (Same Game) 

Clickomania or Same Game [4] is a 
computer puzzle consisting of a rect- 
angular grid of square blocks each 
colored one of k colors. Horizontally Fig. 7. The falling rules for removing a 
and vertically adjacent blocks of the group in Clickomania. Can you remove all 
same color are considered part of the remaining blocks? 
same group. A move selects a group 

containing at least two blocks and removes those blocks, followed by two “falling” 
rules; see Fig. 7 (top). First, any blocks remaining above created holes fall down 
in each column. Second, any empty columns are removed by sliding the succeed- 
ing columns left. 

The main goal in Clickomania is to remove all the blocks. Biedl et al. [4] 
proved that deciding whether this is possible is NP-complete. This complexity 
result holds even for puzzles with two columns and five colors, and for puzzles 
with five columns and three colors. On the other hand, for puzzles with one 
column (or, equivalently, one row) and arbitrarily many colors, they show that 
the maximum number of blocks can be removed in polynomial time. In partic- 
ular, the puzzles whose blocks can all be removed are given by the context-free 
grammar S' ^ A j SS \ cSc \ cScSc where c ranges over all colors. 

Various cases of Clickomania remain open, for example, puzzles with two 
colors, and puzzles with 0(1) rows. Richard Nowakowski suggested a two-player 
version of Clickomania, described in [4], in which players take turns removing 
groups and normal play determines the winner; the complexity of this game 
remains open. 

4.5 Moving Coins 

Several coin-sliding and coin-moving puzzles fall into the following general frame- 
work: re-arrange one configuration of unit disks in the plane into another config- 
uration by a sequence of moves, each repositioning a coin in an empty position 
that touches at least two other coins. Examples of such puzzles are shown in 
Fig. 8. This framework can be further generalized to nongeometric puzzles in- 
volving movement of tokens on graphs with adjacency restrictions. 
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Coin-moving puzzles 
are analyzed by Demaine, 

Demaine, and Verrill [15]. 

In particular, they study 
puzzles as in Fig. 8 in 
which the coins’ centers 
remain on either the trian- 
gular lattice or the square 
lattice. Surprisingly, their 
results for deciding solv- 
ability of puzzles are pos- 
itive. 

For the triangular 
lattice, nearly all puzzles 
are solvable, and there 
is a polynomial-time 
algorithm characterizing 
them. For the square 
lattice, there are more 
stringent constraints. For 
example, the bounding 
box cannot increase by 
moves; more generally, 
the set of positions reach- 
able by moves given an 
infinite supply of extra coins (the span) cannot increase. Demaine, Demaine, 
and Verrill show that, subject to this constraint, there is a polynomial-time 
algorithm to solve all puzzles with at least two extra coins past what is required 
to achieve the span. (In particular, all such puzzles are solvable.) 

5 Cellular Automata and Life 

Conway’s Game of Life is a zero-player cellular automaton played on the square 
tiling of the plane. Initially, certain cells (squares) are marked alive or dead. Each 
move globally evolves the cells: a live cell remains alive if it between 2 and 3 of 
its 8 neighbors were alive, and a dead cell becomes alive if it had precisely 3 live 
neighbors. Chapter 25 of Winning Ways [3, pp. 817-850] proves that no algo- 
rithm can decide whether an initial configuration of Life will ever completely die 
out. In particular, the same question about Life restricted within a polynomi- 
ally bounded region is PSPACE-complete. Several other cellular automata, with 
different survival and birth rules, have been studied; see e.g. [42]. 

6 Open Problems 

Many open problems remain in combinatorial game theory. Guy [2G] has com- 
piled a list of such problems (some of which have since been solved) . An example 
of a difficult unsolved problem is Conway’s angel-devil game [9]. 





ccccco 



(a) Turn 

the pyramid 
upside-down 
in three moves. 



(b) Re-arrange the 
pyramid into a line 
in seven moves 




(c) Flip the diagonal 
in 18 moves. 



(d) Invert the V in 24 
moves. 



Fig. 8. Coin-moving puzzles in which each move 
places a coin adjacent to two other coins; in the 
bottom two puzzles, the coins must also remain on 
the square lattice. The top two puzzles are clas- 
sic, whereas the bottom two puzzles were designed 
in [15] 
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Many open problems also remain on the algorithmic side, and have been 
mentioned throughout this paper. Examples of games and puzzles whose com- 
plexities remain completely open, to my knowledge, are Toads and Frogs [19], [3, 
pp. 14-15], Domineering (Section 3.6), and rectangular sliding-block puzzles 
(Section 4.1). For many other games and puzzles, such as Dots and Boxes (Sec- 
tion 3.7) and pushing-block puzzles (Section 4.3), some hardness results are 
known, but the exact complexity remains unresolved. More generally, an inter- 
esting direction for future research is to build a more comprehensive theory for 
analyzing combinatorial puzzles. 



References 

1. E. Berlekamp. The Dots and Boxes Game: Sophisticated Child’s Play. A. K. Peter’s 
Ltd., 2000. 25 

2. E. Berlekamp and D. Wolfe. Mathematical Go: Chilling Gets the Last Point. 
A. K. Peters, Ltd., 1994. 24 

3. E. R. Berlekamp, J. H. Conway, and R. K. Guy. Winning Ways. Academic Press, 
London, 1982. 19, 21, 22, 23, 24, 25, 26, 27, 30, 31 

4. T. C. Biedl, E. D. Demaine, M. L. Demaine, R. Fleischer, L. Jacobsen, and 
J. I. Munro. The complexity of Clickomania. In R. J. Nowakowski, ed.. More 
Games of No Chance, 2001. To appear. 29 

5. C. L. Bouton. Nim, a game with a complete mathematical theory. Ann. of 
Math. (2), 3:35-39, 1901-02. 21 

6. D. Bremner, J. O’Rourke, and T. Shermer. Motion planning amidst movable square 
blocks is PSPACE complete. Draft, June 1994. 28 

7. M. Buro. Simple Amazons endgames and their connection to Hamilton circuits in 
cubic subgrid graphs. In Proc. 2nd Internat. Conf. Computers and Games, 2000. 
26 

8. J. H. Conway. On Numbers and Games. Academic Press, London, 1976. 19, 20, 
21 

9. J. H. Conway. The angel problem. In R. J. Nowakowski, ed.. Games of No Chance, 
pp. 1-12. Cambridge University Press, 1996. 30 

10. J. Culberson. Sokoban is PSPACE-complete. In Proc. Internat. Conf. Fun with 
Algorithms, pp. 65-76, Elba, Italy, June 1998. 28 

11. S. de Vries and R. Vohra. Combinatorial auctions: A survey. Manuscript, 
Jan. 2001. http: //www-m9 .ma. turn. de/~devries/ comb_auction_supplement/ 
comauction.pdf. 18 

12. E. D. Demaine. Playing games with algorithms: Algorithmic combinatorial game 

theory. Preprint cs.CC/0106019, Computing Research Repository. http://www. 
arXiv.org/abs/cs. CC/0106019. 18, 23, 27 

13. E. D. Demaine, M. L. Demaine, and D. Eppstein. Phutball endgames are NP- 
hard. In R. J. Nowakowski, ed.. More Games of No Chance, 2001. To appear. 
http://www.arXiv.org/abs/cs.CC/0008025. 24, 27 

14. E. D. Demaine, M. L. Demaine, and J. O’Rourke. PushPush and Push-1 are NP- 
hard in 2D. In Proc. 12th Canadian Conf. Comput. Geom., pp. 211-219, 2000. 
http : //www. cs .unb . ca/ conf /cccg/eProceedings/26 .ps . gz. 29 

15. E. D. Demaine, M. L. Demaine, and H. Verrill. Coin-moving puzzles. In MSRI 
Combinatorial Game Theory Research Workshop, Berkeley, California, July 2000. 
30 



32 



Erik D. Demaine 



16. E. D. Demaine and M. Hoffmann. Pushing blocks is NP-complete for noncrossing 
solution paths. In Proc. 13th Canadian Conf. Comput. Geom., 2001. To appear. 
29 

17. A. Dhagat and J. O’Rourke. Motion planning amidst movable square blocks. In 
Proc. 4th Canadian Conf. Comput. Geome., pp. 188-191, 1992. 28 

18. D. Eppstein. Computational complexity of games and puzzles, http://www.ics. 
uci.edu/~eppstein/cgt/hard.html. 23, 25 

19. J. Erickson. New Toads and Frogs results. In R. J. Nowakowski, ed.. Games of 
No Chance, pp. 299-310. Cambridge University Press, 1996. 31 

20. G. W. Flake and E. B. Baum. Rush Hour is PSPACE-complete, or “Why you 
should generously tip parking lot attendants”. Manuscript, 2001. http://www. 
neci.nj .nec.com/homepages/flake/rushhour.ps. 27 

21. A. S. Fraenkel. Combinatorial games: Selected bibliography with a succinct 
gourmet introduction. Electronic Journal of Combinatorics, 1994. Dynamic Survey 
DS2, http://www.combinatorics.org/Surveys/. 19 

22. A. S. Fraenkel, M. R. Garey, D. S. Johnson, T. Schaefer, and Y. Yesha. The 
complexity of checkers on an N x Y board - preliminary report. In Proc. 19th 
IEEE Sympos. Found. Comp. Sci., pp. 55-64, 1978. 23, 24 

23. A. S. Fraenkel and D. Lichtenstein. Computing a perfect strategy for n x n chess 
requires time exponential in n. J. Combin. Theory Ser. A, 31:199-214, 1981. 24 

24. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the 
Theory of NP-Completeness. W. H. Freeman & Co., 1979. 26 

25. P. M. Grundy. Mathematics and games. Eureka, 2:6-8, Oct. 1939. 21 

26. R. K. Guy. Unsolved problems in combinatorial games. In R. J. Nowakowski, ed.. 
Games of No Chance, pp. 475-491. Cambridge University Press, 1996. 30 

27. R. K. Guy and G. A. B. Smith. The G-values of various games. Proc. Cambridge 
Philos. Soc., 52:514-526, 1956. 22 

28. M. Hoffmann. Push-* is NP-hard. In Proc. 12th Canadian Conf. Comput. Geom., 
pp. 205-210, 2000. http://www.cs.unb.ca/conf/cccg/eProceedings/13.ps.gz. 
28, 29 

29. E. Hordern. Sliding Piece Puzzles. Oxford University Press, 1986. 27 

30. R. Kaye. Minesweeper is NP-complete. Math. Intelligencer, 22(2):9-15, 2000. 28 

31. M. Lachmann, C. Moore, and I. Rapaport. Who wins domineering on rectangu- 
lar boards? In MSRI Combinatorial Game Theory Research Workshop, Berkeley, 
Galifornia, July 2000. 25 

32. D. Lichtenstein and M. Sipser. GO is polynomial-space hard. J. Assoc. Com- 
put. Mach., 27(2):393-401, Apr. 1980. 24 

33. G. H. Papadimitriou. Algorithms, games, and the Internet. In Proc. 33rd ACM 
Sympos. Theory Comput., Grete, Greece, July 2001. 18 

34. D. Ratner and M. Warmuth. The (n^ — l)-puzzle and related relocation problems. 
J. Symbolic Comput., 10:111-137, 1990. 27 

35. S. Reisch. Hex ist PSPAGE-vollstandig. Acta Inform., 15:167-191, 1981. 23 

36. J. M. Robson. The complexity of Go. In Proceedings of the IFIP 9th World 
Computer Congress on Information Processing, pp. 413-417, 1983. 24 

37. J. M. Robson. Y by Y Checkers is EXPTIME complete. SIAM J. Comput., 
13(2):252-267, May 1984. 23 

38. C. A. B. Smith. Graphs and composite games. J. Combin. Theory, 1:51-81, 1966. 
21 

39. R. Sprague. Uber mathematische Kampfspiele. Tohoku Mathematical Journal, 
41:438-444, 1935-36. 21 



Algorithmic Combinatorial Game Theory 



33 



40. G. Wilfong. Motion planning in the presence of movable obstacles. Ann. Math. Ar- 
tificial Intelligence, 3(1):131-150, 1991. 28 

41. D. Wolfe. Go endgames are PSPACE-hard. In R. J. Nowakowski, ed., More Games 
of No Chance, 2001. To appear. 24 

42. S. Wolfram. Cellular Automata and Complexity: Collected Papers. Perseus Press, 
1994. 30 



Some Recent Results on Data Mining and Search 



Amos Fiat 

Department of Computer Science, Tel Aviv University 



Abstract. In this talk we review and survey some recent work and 
work in progress on data mining and web search. We discuss Latent 
Semantic Analysis and give conditions under which it is robust. We also 
consider the problem of collaborative filtering and show how spectral 
techniques can give a rigorous and robust justification for doing so. We 
consider the problems of web search and show how both Google and 
Klienberg’s algorithm are robust under a model of web generation, and 
how this model can be reasonably extended. We then give an algorithm 
that provably gives the correct result in this extended model. The results 
surveyed are joint work with Azar, Karlin, McSherry and Saia [2], and 
Achlioptas, Karlin and McSherry [1]. 



1 A General Data Mining Model and Applications 

We begin by presenting a general model that we believe captures many of the 
essential features of important data mining tasks. We then present a set of con- 
ditions under which data mining problems in this framework can be solved using 
spectral techniques, and use these results to theoretically justify the prior em- 
pirical success of these techniques for tasks such as object classification and web 
site ranking. We also use our theoretical framework as a foundation for devel- 
oping new algorithms for collaborative filtering. Our data mining models allow 
both erroneous and missing data, and show how and when spectral techniques 
can overcome both. 



Generative Error Probabilistic 

Data Model Process Omission 
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Fig. 1. The data generation model 



The data mining model we introduce assumes that the data of interest can 
be represented as an object/attribute matrix. The model is depicted in Figure 1 
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which shows how three fundamental phenomena combine to govern the process 
by which a data set is created: 

1. A probabilistic model of data M: We assume that there exists an under- 
lying set of probability distributions that govern each object’s attribute val- 
ues (in the degenerate case, these values could be deterministically chosen). 
These probability distributions are captured by the probabilistic model M in 
the figure, where the random variable describing the ith attribute of the j-th 
object is denoted Mij. The actual value of this attribute is then obtained 
by sampling from the distribution Mij] we denote the resulting value rriij. 
We assume that the Mij’s are independent. 

2. An error process Z: We assume that the data is noisy and error-ridden. 
The error process Z describes the manner by which the error is generated. We 
assume that the data value is corrupted by the addition of the error Zij . 

3. An omission process P: Some of the data may not be available to the data 
miner. In our model, we assume that there is a probability distribution P 
governing the process by which data is omitted or made available. In partic- 
ular, the value rriij + Zij is available to the data miner with probability pij, 
and is omitted from the data set (which we represent by the presence of a 
“?” ) with probability 1 — pij . We denote by A* the resulting data set (which 
is then input to the data mining algorithm). 

The goal of the data mining algorithm: given A* as input (and no knowl- 
edge of M, P or Z), obtain meaningful information about M. In particular, 
we are interested in obtaining information about the matrix Pi(M), whose (*, j) 
entry is the expectation of the random variable . 

Clearly, without any assumptions about M, Z and P, it is hopeless to achieve 
the data mining goal just laid out. We present a general set of conditions under 
which it is possible to efficiently retrieve meaningful information about E(M). 
In essence, our results show that if the underlying data model is sufficiently 
“structured”, then the randomness of a probabilistic process, the addition of 
error and the fact that a significant fraction of the data may be missing will 
not prevent the data miner from recovering meaningful information about the 
“true” data. 

More formally, we prove the following general theorem: 

Theorem 1. Suppose that the availability matrix P is known to the data mining 
algorithm, and its entries are bounded away from 0. In addition, suppose that 
E(M) is a rank k matrix, and the 2-norm of the error matrix Z is o{ak), where 
(Tfe is the k-th singular value ofPi{M). Then there is a polynomial time algorithm, 
that takes as input only P and A* , that is guaranteed to reconstruct 1 — o(l) of 
the entries ofPi{M) to within an additive o(l) error. 

Many of the data mining problems can be expressed as special cases of the 
above theorem. We can model Latent Semantic Analysis, (LSA) pioneered by 
Deerwester et al. [4] , via this model. We can also extend the work of Papadim- 
itriou, Raghavan, Tamaki and Vempala [7] on latent semantic indexing to deal 
with arbitrary matrices rather than perturbed block matrices. 



Some Recent Results on Data Mining and Search 



35 



A fundamental problem in data mining, usually referred to as collaborative 
filtering (or recommendation systems) is to use partial information that has been 
collected about a group of users to make recommendations to individual users. To 
our knowledge, there has been very little prior theoretical work on collaborative 
filtering algorithms other than the work of Kumar, Raghavan, Rajagopalan and 
Tomkins who took an important first step of defining an analytic framework for 
evaluating collaborative filtering [-5] . 

We model the collaborative filtering problem within the framework of our 
general data mining model as follows: We assume that the utility of product j 
for individual i is given by a random variable and which data is missing is 
determined by a probabilistic omission process P. 

Kleinberg’s seminal work on web hubs and authorities has had a true impact 
on the real world [6]. 

It is an immediate consequence of our results that Kleinberg’s definition of 
importance is robust in the sense that the important sites will remain important 
(almost) irrespective of the actual random choices made when the “real world” 
is constructed. 



2 Web Search 

We present a generative model for web search which captures in a unified man- 
ner three critical components of the problem: how the link structure of the web 
is generated, how the content of a web document is generated, and how a hu- 
man searcher generates a query. The key to this unification lies in capturing the 
correlations between each of these components in terms of proximity in latent 
semantic space. Given such a combined model, the correct answer to a search 
query is well defined, and thus it becomes possible to evaluate web search al- 
gorithms rigorously. We present a new web search algorithm, based on spectral 
techniques, and prove that it is guaranteed to produce an approximately correct 
answer in our model. The algorithm assumes no knowledge of the model, and is 
well-defined regardless of the accuracy of the model. 

We like to think of the task at hand for web search as being the following: 

1 . Take the human generated query and determine the topic to which the query 
refers. There is an infinite set of topics on which humans may generate 
queries. 

2. Synthesize a perfect hub for this topic. The perfect hub is an imaginary 
page, no page remotely resembling this imaginary hub need exist. On this 
imaginary hub the authorities on the topic of the query will be listed in order 
of decreasing authoritativeness. 

We present a new algorithm that is a multi-dimensional generalization of 
both Kleinberg’s algorithm and of Google [3], which provably gives the correct 
result in the model. 
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Abstract. This paper surveys recent results related to the concept of 
hypertree decomposition and the associated notion of hypertree width. 
A hypertree decomposition of a hypergraph (similar to a tree decomposi- 
tion of a graph) is a suitable clustering of its hyperedges yielding a tree or 
a forest. Important NP hard problems become tractable if restricted to 
instances whose associated hypergraphs are of bounded hypertree width. 
We also review a number of complexity results on problems whose struc- 
ture is described by acyclic or nearly acyclic hypergraphs. 



1 Introduction 

One way of coping with an NP hard problem is to identify significantly large 
classes of instances that are both recognizable and solvable in polynomial time. 
Such instances are often defined via some structural property of a graph G{I) 
that is associated in a canonical way with the instance I. For example, many 
problems that are NP complete in general become tractable for instances / whose 
associated graph has bounded treewidth (cf. Sect. 4). Treewidth is a measure of 
the degree of cyclicity of a graph. Note that instances of bounded treewidth are 
also easy to recognize given that deciding whether the treewidth of a graph is 
at most k is decidable in linear time for each constant k. 

The structure of a large number of problems is, however, more faithfully de- 
scribed by a hypergraph than by a graph. Again, several NP complete problems 
become tractable if restricted to instances with acyclic hypergraphs. In order to 
obtain larger tractable instance-classes of hypergraph-based problems, we thus 
investigated measures of hypergraph cyclicity that play a similar role for hyper- 
graphs as the concept of treewidth does for graphs. In particular, an appropriate 
notion of hypergraph width (and an associated method of hypergraph decompo- 
sition) should fulfil both of the following conditions: 

1. Relevant hypergraph-based problems should be solvable in polynomial time 
for instances of bounded width. 
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2. For each constant k, one should be able to check in polynomial time whether 
a hypergraph is of width k, and, in the positive case, it should be possible 
to produce an associated decomposition of width k of the given hypergraph. 

Existing measures for hypergraph cyclicity we were aware of do either not 
fulfil one of these two conditions (e.g. recognizing hypergraphs of bounded query 
width is NP complete, cf. Section 4.3), or are not general enough (such meth- 
ods are mentioned in Section 10). In particular, various notions of hypergraph 
width can be obtained by first transforming a hypergraph into a graph (there 
are several ways of doing so, see Section 4.2) and then considering the treewidth 
of that graph. However, it is not hard to see that such measures of cyclicity are 
not very significant due to a loss of structural information caused by the trans- 
formation of the hypergraph to a graph (cf. Sect. 4.2). In summary, it appeared 
that a satisfactory way of determining the degree of cyclicity of a hypergraph 
was missing, on the basis of which large tractable instances of relevant NP-hard 
problems could be defined. 

Consequently, after a careful analysis of the shortcomings of various hyper- 
graph decomposition methods, we introduced the new method of hypertree de- 
composition and the associated notion of hypertree width. To our best knowledge, 
the method of hypertree decomposition is currently the most general known hy- 
pergraph decomposing method leading to large tractable classes of important 
problems such as constraint satisfaction problems or conjunctive queries. The 
notion of hypertree decomposition and the associated notion of hypertree width 
are the main topics of the present survey paper. However, we will also report 
on a number of other closely related issues, such as the precise (parallel) com- 
plexity of acyclic database queries, and the notion of query decomposition. Our 
results surveyed here are mainly from the following sources, where formal proofs, 
details, and a number of further results can be found: 

— Reference [17], where the precise complexity of acyclic Boolean conjunctive 
queries (ABCQs) is determined, and where highly parallel database algo- 
rithms for solving such queries are presented. (In the present paper, we will 
not discuss parallel database algorithms and refer the interested reader to [17] 
and [22]). 

— Reference [19], where we first study query width, a measure for the amount 
of cyclicity of a query introduced by Chekuri and Rajamaran [6], and where 
we define and study the new (more general) concept of hypertree width. 

— Reference [21], where we establish criteria for comparing different CSP de- 
composition methods and where we compare various methods including the 
method of hypertree decomposition. The comparison criteria and the results 
of the comparison are reported in Section 10 of the present paper. 

— Reference [24], where hypertree width is compared to Courcelle’s notion of 
clique width [7,8]. 

— Reference [23], where we give a game theoretic and a logical characterization 
of hypertree width. These results are reported in Sections 8 and 9 of the 
present paper, respectively. 
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This paper is organized as follows. In Section 2 we define a number of im- 
portant hypergraph-based problems. In Section 3 we discuss the complexity of 
acyclic instances of these problems. In Section 4, discuss graph treewidth and a 
generalization termed query width. In Section 5 we define the concepts of hyper- 
tree decomposition and hypertree width. In Section 6, we show how hypergraphs 
of bounded hypertree width can be recognized in polynomial time. In Section 7, 
we show how CSPs can be solved in polynomial time for instances of bounded 
hypertree- width. In Section 8, we describe the Robber and Marshals game which 
characterizes hypergraphs of bounded hypertree width. In Section 9, we briefly 
describe our logical characterization of queries of bounded hypertree-width. In 
Section 10, we give a brief account on how the notion of hypertree decomposition 
compares to other related notions. Finally, in Section 11, we state some relevant 
open problems. 

2 Hypergraph-Based Problems 

A relational vocabulary (short: vocabulary) consists of a finite nonempty set of 
relation symbols P,Q, R . . with associated arities. A finite relational structure 
(short: finite structure) C over a vocabulary r consists of a finite universe Uc 
and for each fc-ary relation symbol i? in r a relation R^ C U^. For a tuple 
(ci, . . . , Cfc) G R'" we often write R^{c\, . . . ,Ck). We denote the vocabulary of a 
structure C by vo{C). 

Let A and B be two finite structures such that vo{A) C vo{B). Then a 
mapping h : A — > B is a homomorphism from A to B if for each relation 
symbol R € vo(A) it holds that whenever (ci,...,Cfc) G R"^ for some ele- 
ments Cl,..., Cfc G Ua, then it also holds that (h(ci), . . . , h(ck)) G R^ . The 
following is a fundamental computational problem in Algebra: 

Definition 1 (The Homomorphism Problem HOM). Given two finite 
structures A and B, decide whether there exists a homomorphism from A to B. 
We denote such an instance o/HOM by HOM(A, S) 

It is well-known (cf. [12]) that HOM is an NP-complete problem. For ex- 
ample, checking whether a graph (V) E) is three colorable amounts to solve the 
HOM{A,B) problem for structures A and B over a vocabulary with a unique 
binary relation symbol R, where R^ = E and = {{red, blue), {blue, red), 
{red, green), {green, red), {blue, green), {green, blue)}. 

In [12,30] it was observed that HOM is equivalent to (and actually, in essence, 
the same as) the important constraint satisfaction problem (CSP) of Artificial 
Intelligence [9], which, in turn, is equivalent to the database problem BCQ of 
evaluating Boolean conjunctive queries: 

Definition 2 (The Constraint Satisfaction Problem CSP.)- Given a fi- 
nite set Var of variables, a finite domain U of values, a set of constraints 
C = {Cl, C 2 , . . . , Cq}, where each constraint Ci is a pair {Si, Vi), and where Si is 
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a list of variables of length mi, called the constraint scope, and ri is an mi-ary re- 
lation over U , called a constraint relation, decide whether there is a substitution 
d : Var — > U , such that, for each 1 < i < q. Sit) € . 

Definition 3 (The Boolean Conjunctive Query Problem BCQ.). A re- 
lational database is formalized as a finite relational structure D. A Boolean 
conjunctive query (BCQ) on D is a sentence of first-order logic of the form: 
3Xi,...,Xr Ri{t\,tl...,tlf^^^) A ... A Rk{ttt!)...,tlf^^^), where, for 1 < 
i < k, Ri is a relation symbol from vo{D) of associated arity a{i), and for 
1 < i < k and 1 < j < o:{i), each t) is a term, i.e., either a variable from 
the list Xi, . . . , Xr, or a constant element from Ud- The decision problem BCQ 
is the problem of deciding for a pair (D,Q), where D is a database and Q is 
a Boolean conjunctive query, whether Q evaluates to true over D, denoted by 
D^Q. 

Given that all variables occuring in a BCQ are existentialy quantified, we 
usually omit the quantifier prefix and write a BCQ as a conjunction of query 
atoms. For example, let emp denote a relation containing (employee^, project^) 
pairs, and let rel be a relation containing a pair (ni, ri 2 ) if ni and ri 2 are numbers 
of distinct employees who are relatives, then the BCQ emp(X, Z) A emp(y, Z) A 
rel(X, Y) expresses that there are two employees who are relatives and work on 
the same project. 

Note that by simple logspace operations (selections and projections on the 
corresponding relations) one can always eliminate constants occurring in BCQ 
atoms. We thus assume w.l.o.g. that query atoms contain only variables as ar- 
guments. By this assumption, CSP and BCQ are exactly the same problem, 
where each constraint scope corresponds to a query atom, each constraint rela- 
tion corresponds to a database relation, and vice-versa. Now each CSP (or BCQ) 
instance I can in turn be identified with HOM(^, B), where A is the structure 
whose universe Ua consists of the set Var of all variables of I and whose relations 
contain constraint scopes (or query atoms) as tuples, and where B is the struc- 
ture whose universe Ub is the finite domain U oi I and whose relations are just 
the constraint relations (or the database relations). In this sense, we can speak 
about instances CSP(A, B) and BCQ(^, B), where T is a structure representing 
the constraint scopes or the query, and B denotes the set of constraint relations, 
or the database, respectively. On the other hand, each instance / = HOM(A, B) 
of HOM can be identified in the obvious way with a CSP instance (or a BCQ 
instance) by interpreting the elements of Ua as variables and those of Ub as 
domain elements of the constraint (or database) relations. 

Thus all three problems HOM, CSP, and BCQ are the same and are NP- 
complete (for BCQ this was first shown in [5]). Therefore, it is important to 
find large classes of instances that can be evaluated in polynomial time. Such 
classes can be defined by imposing structural restrictions on the problem in- 
stances. In particular, for HOM(H, B), CSP(H, B), or, equivalently, BCQ(H, B), 
one could impose restrictions on the structure A, or on the structure B, or on 
both (cf. [35,33,30]). In this paper we are interested in restrictions on the struc- 
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ture A. In database terms, we can recast this by saying that we are interested in 
the structure of the query, rather than on the properties of the database content. 

If A denotes a set of structures, then HOM(^), CSP(^), and BCQ(^) denote 
the restrictions of HOM, CSP, and BCQ to instances HOM(A, S), CSP(A, S), 
and BCQ (A, B) respectively, where A G A. 

Each finite structure C over universe Uc defines a hypergraph 'H{C) = (V, H) 
as follows: The set V of vertices V of 7i(C) coincides with Uc; the set of hy- 
peredges H of 'H(C) consists of all sets {ci,...,Cfe} such that there exists a 
relation R in voc{C) and (ci, . . . , Cfe) S i?*". 

To each problem instance I = HOM(A,i3) or I = CSP(A,i3), or / = 
BCQ(A, B), we define the associated hypergraph Hi by Hi = H{A). In particu- 
lar, this means, that for an instance I of CSP, Hi denotes the hypergraph whose 
vertices are the variables of / and whose hyperedges are all sets {Xi, . . . ,Xk} 
such that there exists a constraint scope S = (Xi, . . . , Xk) belonging to I. 

For an instance I = {D, Q) of BCQ, Hi denotes the hypergraph whose vertices 
are all the variables occurring in Q and whose hyperedges are the sets var(a) of 
variables occuring in a, for each query atom a. 

Example 4- Figure 1(a) shows Hi^ for an instance I\ of BCQ having query Q\ : 
a{S, X, T, R) A b{S, Y, U, P) A f{R, P, V) A g{X, Y) A c(T, U, Z) A d{W, X, Z) A 
e{Y,Z) 

It is furthermore easy to see that HOM, BCQ, and CSP are all equivalent (via 
logspace transformations) to the following fundamental problems in database 
theory and artificial intelligence [17]: The Query Output Tuple Problem: Given a 
conjunctive query Q, a database db, and a tuple t, determine whether t belongs 
to the answer Q(db) of Q over db. The Conjunctive Query Containment: Decide 
whether a conjunctive query Qi is contained in a conjunctive query Q 2 - Query Qi 
is contained in query Q 2 if, for each database instance db, the answer Qi(db) is a 
subset of Q 2 (db). The Clause Subsumption Problem: Check whether a (general) 
clause C subsumes a clause D, i.e., whether there exists a substitution -d such 
that Cl} C D. A (general) clause is a disjunction of (positive or negative) literals. 




(a) 





(b) ( c) 



Fig. 1. (a) Hypergraph (b) a width 2 hypertree decomposition of Hi^; and 
(c) the primal graph of Hi^ 
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possibly containing function symbols. Note that subsumption is an extremely 
important technique used in clause-based theorem proving [1]. 

Just for the sake of presentation, we will focus in the rest of this paper on 
the constraint statisfaction problem (CSP). 

While, as we will see, many interesting structural properties of a CSP instance 
I = CSP(^, B) can be identified by looking at the associated hypergraph Tii = 
7i{A), which in AI is called the constraint hypergraph, some structural properties 
of / may be also detected using its primal graph, i.e., the primal graph of the 
hypergraph associated to A, which coincides with the Gaifman graph of A [15]. 
Let Til = (y, H) be the constraint hypergraph of a CSP instance I. The primal 
graph of / is a graph G = {V, E), having the same set of variables (vertices) as Hi 
and an edge connecting any pair of variables X,Y G V such that {X, Y} C h for 
some h G H . Note that, if the vocabulary of A contains only binary predicates, 
then all constraints of I are binary and its associated hypergraph is identical to 
its primal graph. The primal graph of the hypergraph of query Qi (and of the 
equivalent CSP instance) is depicted in Fig. 1(c). 

Since in this paper we always deal with hypergraphs corresponding to CSP 
or BCQ instances, the vertices of any hypergraph H = {V,H) can be viewed 
as the variables of some constraint satisfaction problem or of some conjunctive 
query. Thus, we will often use the term variable as a synonym for vertex, when 
referring to elements of V. For the hypergraph H = (V, H), var{H) and edges{H) 
denote the sets V and H, respectively. When illustrating a decomposition, we will 
usually represent hyperedges of the hypergraph Hi of a BCQ or CSP instance / 
by their corresponding query atoms or constraint scopes. 



3 Acyclic Instances 

The most basic and most fundamental structural property considered in the 
context of CSPs and conjunctive queries is acyclicity. It was recognized in AI 
and database theory that acyclic CSPs or conjunctive queries are polynomially 
solvable. A CSP instance I is acyclic if its associated hypergraph Hi is acyclic. 

A hypergraph H is acyclic if and only if its primal graph G is chordal (i.e., 
any cycle of length greater than 3 has a chord) and the set of its maximal cliques 
coincide with edges{H) [2]. 

A join tree JT{H) for a hypergraph H is a, tree whose nodes are the edges of 
H such that whenever the same vertex X G V occurs in two edges Ai and A 2 
of H, then Ai and A 2 are connected in JT{H), and X occurs in each node on 
the unique path linking Ai and A 2 in JT{H). In other words, the set of nodes 
in which X occurs induces a (connected) subtree of JT{H). We will refer to this 
condition as the Connectedness Condition of join trees. 

Acyclic hypergraphs can be characterized in terms of join trees: A hypergraph 
H is acyclic iff it has a join tree [3,2,32]. 

Note that acyclicity as defined here is the usual concept of acyclicity in the 
context of database theory and AI. It is referred to as a-acyclicity in [II]. This 
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is the least restrictive concept of hypergraph acyclicity among all those defined 
in the literature. 

Acyclic CSPs and conjunctive queries have highly desirable computational 
properties: 

1. Acyclic instances can be efficiently solved. Yannakakis provided a (sequen- 
tial) polynomial time algorithm solving BCQ on acyclic queries^ [41]. 

2. Acyclicity is efficiently recognizable, and a join tree of an acyclic hypergraph 
is efficiently computable. A linear-time algorithm for computing a join tree 
is shown in [37]; an method has been provided in [17] denotes 
logspace relativized by an oracle in symmetric logspace; this class could also 
be termed “functional SL”). 

3. The result of a (non-Boolean) acyclic conjunctive query Q can be computed 
in time polynomial in the combined size of the input instance and of the 
output relation [41]. 

4. Arc- consistency for acyclic CSP instances can be enforced in polynomial 
time [9,10]. 

Intuitively, the efficient behavior of acyclic instances is due to the fact that 
they can be evaluated by processing any of their join trees bottom-up by perform- 
ing upward semijoins, thus keeping small the size of the intermediate relations 
(which could become exponential if regular join were performed) . 

We have recently determined the precise computational complexity of BCQ, 
and hence of HOM, CSP, and all their equivalent problems. It turned out all 
these problems are highly parallelizable on acyclic structures, as they are com- 
plete for the low complexity class LOGCFL [17]. This is the class of all deci- 
sion problems that are logspace-reducible to a context-free language. Note that 
NL C LOGCFL C AC^ C NC^ C P where NL denotes nondeterministic logspace 
and AC^ and NC^ are logspace-uniform classes based on the corresponding types 
of Boolean circuits (for precise definitions of all these complexity classes, cf. [29]). 
Let AH be the set of all finite acyclic relational structures. 

Theorem 5 ([17]). CSP(yl7i) is LOGCFL -compfete. 

Moreover, the functional version of these problems belongs to the functional 
version of LOGCFL, i.e., a solution for a CSP instance can be computed in 
pLOGCFL^ i.e., functional logspace with an oracle in LOGCFL. Efficient parallel 
algorithms - even for non-Boolean queries - have been proposed in [22] . They run 
on parallel database machines that exploit the inter -operation parallelism [40], 
i.e., machines that execute different relational operations in parallel. 

The important speed-up obtainable on acyclic instances stimulated several 
research efforts towards the identification of wider classes of queries and con- 
straints having the same desirable properties as acyclic CQs and CSPs. 

^ Note that, since both the database db and the query Q are part of an input-instance 
of BCQ, what we are considering is the combined complexity of the query [38]. 



44 



Georg Gottlob et al. 



4 Treewidth, Query Width, and Hypertree Width 

4.1 Tree Decompositions and Treewidth of Graphs 

The treewidth of a graph is a well-known measure of its tree-likeness introduced 
by Robertson and Seymour in their work on graph minors [34]. This notion plays 
a central role in algorithmic graph theory as well as in many subdisciplines of 
Computer Science. 

Definition 6. A tree decomposition of a graph G = {V,E) is a pair (T, y), 
where T = (N,F) is a tree, and y is a labeling function associating to each 
vertex p G N a set of vertices y(p) C V, such that the following conditions are 
satisfied: (1) for each vertex b of G, there exists p G N such that b G y(p); (2) 
for each edge {b,d} G E, there exists p G N such that {b,d} C y(p); (3) for 
each vertex b of G, the set {p G N \ b G x{p)} induces a (connected) subtree 
of T. 

The width of the tree decomposition (T, y) is maxpg n \x{p) ~ 1 1 ■ The treewidth 
of G is the minimum width over all its tree decompositions. The treewidth of a 
CSP instance is the treewidth of its associated primal graph. 

The notion of treewidth is a generalization of graph acyclicity. In particular, 
a graph is acyclic if and only if its treewidth is one [34] . 

Checking whether a graph has treewidth at most k for a fixed constant k, 
and in the positive case computing a fc-width tree decomposition, is feasible in 
linear time [4]. Moreover, this task is also parallelizable. Indeed, Wanke [39] has 
shown that, for a fixed constant k, checking whether a graph has treewidth k is 
in LOGCFL. By proving some general complexity-theoretic results and by using 
Wanke’s result, the following was shown in [18]: 

Theorem 7 ( [18]). For each constant k, there exists an trans- 

ducer Tk that behaves as follows on input G. If G is a graph of treewidth < k, 
then Tk outputs a tree decomposition of width < k of G. Otherwise, Tk halts with 
empty output. 

Thus, a tree decomposition of width at most k can be also computed in (the 
functional version of) LOGCFL, and thus by logspace uniform AC^ and NC^ 
circuits. 

An important feature of treewidth is that many NP-complete problems are 
decidable in polynomial-time on structures having bounded treewidth, i.e., hav- 
ing treewidth at most k for some fixed constant fc > 0. In particular, Courcelle 
proved that every property expressible in monadic second order logic is decidable 
in linear time over bounded treewidth graphs [7]. 

4.2 Treewidth of Hypergraphs 

As mentioned in the previous section, many NP-complete problems become 
tractable on bounded treewidth graphs. In order to exploit this nice feature 



Hypertree Decompositions: A Survey 



45 



for CSP, BCQ, and their equivalent problems, many researchers in the AI and 
the database communities considered the primal graph (of the hypergraph) as- 
sociated to the relational structure. Let TW[/c] be the set of all finite relational 
structures whose associated primal graph has tree width at most k. It has been 
shown that CSP(TW[fc]) is solvable in polynomial time [14] and has the same 
properties of CSP(„47i), including its precise computational complexity. 

Theorem 8 ([17]). CSP(TW[fcj) is LOGCFL-complete. 

Note that considering the primal graph associated to a hypergraph is not 
the one possible choice. Given a CSP instance /, the dual graph [9,10,32] of the 
hypergraph Hi is a graph Gj = (P, E) defined as follows: the set of vertices V 
coincides with the set of (hyper) edges of Hi, and the set E contains an edge 
{h, h'} for each pair of vertices h,h' gV such that h\Fh' ^ That is, there is 
an edge between any pair of vertices corresponding to hyperedges of Hi sharing 
some variable. 

The dual graph often looks very intricate even for simple CSPs. For in- 
stance, in general, acyclic CSPs do not have acyclic dual graphs. However, it 
is well known that the dual graph Gj can be suitably simplified in order to 
obtain a “better” graph G' which can still be used to solve the given CSP in- 
stance I. In particular, if / is an acyclic CSP, Gj can be reduced to an acyclic 
graph that represents a join tree of Hi- In this case, the reduction is feasible 
in polynomial (actually, linear) time. (See, e.g., [32].) However, in general, it 
is not known whether there exists an efficient algorithm for obtaining the best 
simplified graph G' with respect to the treewidth notion, i.e., the simplification 
of Gj having the smallest treewidth over all its possible simplifications (see [30] 
for a formal statement of this open problem and [21] for a comparison of this 
notion with some hypergraph-based notions). 

Another possibility is considering the so called incidence graph [6]. Given 
a CSP instance I, the incidence graph G\{Hi) = (V',E) associated to the 
hypergraph Hi = {V, H) has a vertex for each variable and for each hyperedge 
of Hi- There is an edge {x, h} ^ E between a variable x gV and and hyperedge 
h G H whenever x occurs in h. 

The class of all CSP instances whose dual graphs (resp. incidence graphs) 
have bounded treewidth are solvable in polynomial time and, actually, they are 
LOGCFL-complete. However, note that none of these classes of CSP instances 
generalize the class CSP(.4?f). Indeed, there are families of acyclic hypergraphs 
whose associated primal graphs, dual graphs (without considering simplifica- 
tion), and incidence graphs have unbounded treewidth. 

Note that by results of [25] bounded treewidth is most likely the best and 
most general structural restriction for obtaining tractable CSP and BCQ in- 
stances, when the structure of a CSP or BCQ is described via a graph (e.g. 
the primal graph), rather than by a hypergraph. Further interesting material on 
BCQ and treewidth can be found in [13]. 
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4.3 Query Decompositions and Query Width 

A more general irotion that geireralizes hypergraph acyclicity is query width [6]. 
The notion of bouirded query-width is based oir the coircept of query decomposi- 
tion [6]. We irext adapt this irotioir to the more general settiirg of hyper graphs, 
while it was originally defined in terms of queries. Roughly, a query decomposi- 
tion of a hypergraph Ti, consists of a tree each vertex of which is labelled by a 
set of hyperedges and/or variables. Each variable and hyperedge induces a con- 
nected subtree {connectedness condition). Each hyperedge occurs in at least oire 
label. The width of a query decomposition is the maximum of the cardinalities 
of its vertices. The query-width qw{7d) of hi is the minimum width over all its 
query decompositions. 

Example 9. Consider the CSP instance I 2 having the following constraint scopes: 

a(5. A, W, C, F), b{S, T, W, C', F'), c(C, C', Z), d(A, Z), e(T, Z), 

/(F, F', Z'), g{X', Z'), h{Y\ A), j( J, A, A, A', Y') 

The query-width of Tt/j is 3. Figure 2 shows a query decomposition of Tt/j of 
width 3. W.l.o.g. we represent hyperedges by the corresponding constraint scopes 
or query atoms in such decompositions. 




Fig. 2. A 3-width query decompositioir of Tii^ 



Each hypergraph whose primal graph has treewidth at most k has query 
width at most k, too. The coirverse does not hold, in general. Moerover, this 
notioir is a true generalization of the basic concept of acyclicity: A hypergraph 
is acyclic iff it has query width 1. 

Let fc be a fixed constant. Chekuri and Rajaramair [6] proved that, giveir a 
BCQ instance / and a query decomposition of Tij having width at most fc, / is 
solvable in polynomial time. In [17] it was shown that this problem is LOGCFL- 
complete (and thus highly parallelizable) . 
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However, when the notion of query- width was defined and studied in [6], 
no polynomial algorithm for checking whether a hypergraph has query-width 
at most k was known, and Chekuri and Rajaraman [6] stated this as an open 
problem. This problem is solved in [19], where it is shown that is unlikely to find 
an efficient algorithm for recognizing instances of bounded query-width. 

Theorem 10 ([19]). Determining whether the query-width of a hypergraph is 
at most 4 is -complete. 

Fortunately, it turned out that the high complexity of determining bounded 
query-width is not, as one would usually expect, the price for the generality of 
the concept. Rather, it is due to some peculiarity in its definition. In the next 
section, we present a new notion that does not suffer from such problems. Indeed, 
this notion generalizes query width (and hence acyclicity) and is tractable. 

5 Hypertree Decompositions and Hypertree Width 

A new class of tractable CSP instances, which generalizes the class CSP(.47i) of 
CSP instances having an acyclic hypergraph, has recently been identified [19]. 
This is the class of CSPs whose hypergraph has a bounded-width hypertree 
decomposition [19]. 

A hypertree for a hypergraph is a triple (T, y. A) , where T = {N, E) is a 
rooted tree, and y and A are labeling functions which associate to each vertex 
p G N two sets y(p) C var{H) and \{p) C edges{H). If T' = {N',E') is a 
subtree of T, we define y(T') = We denote the set of vertices N 

of T by vertices{T), and the root of T by root(T). Moreover, for any p G N, Tp 
denotes the subtree of T rooted at p. 

Definition 11. A hypertree decomposition of a hypergraph is a hypertree 
EID = (T, y. A) for Ti which satisfies all the following conditions: 

1. for each edge h G edges{H), there exists p G vertices{T) such that varQi) C 
x{p) (we say that p covers h); 

2. for each variable Y G var{H), the set {p G vertices{T) \ Y G y(p)} induces 
a (connected) subtree of T; 

3. for each p G vertices{T), xip) ^ var{X{p))] 

4. for each p G vertices{T), var{\{p)) fl y(Tp) C y(p). 

Note that the inclusion in Condition 4 is actually an equality, because Con- 
dition 3 implies the reverse inclusion. 

An edge h G edges{Ti.) is strongly covered in EID if there exists p G vertices{T) 
such that var{h) C y(p) and h G A(p). We then say that p strongly covers h. 

A hypertree decomposition HD of hypergraph is a complete decomposition 
of H if every edge of H is strongly covered in HD. 

The width of a hypertree decomposition (T, y. A) is rnaXp^yertices(T)\'^{p)\- 
The hypertree width hw{H) of H is the minimum width over all its hypertree de- 
compositions. A c-width hypertree decomposition of H is optimal if c = hw{H). 
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The acyclic hypergraphs are precisely those hypergraphs having hypertree 
width one. Indeed, any join tree of an acyclic hypergraph Ti trivially corresponds 
to a hypertree decomposition of 7i of width one. Furthermore, if a hyper graph 
has a hypertree decomposition of width one, then, from this decomposition, 
we can easily compute a join tree of Ti' , which is therefore acyclic [19]. 

It is worthwhile noting that from any hypertree decomposition HD of 7i, we 
can easily compute a complete hypertree decomposition of Ti. having the same 
width in 0(|j7f|| • time. 

Intuitively, if is a cyclic hypergraph, the x labeling selects the set of vari- 
ables to be fixed in order to split the cycles and achieve acyclicity; X{p) “covers” 
the variables of x{p) by a set of edges. 




Fig. 3. A Hypertree decomposition of 0.2 



Example 12. Figure 3 shows an hypertree decomposition HD 2 of the cyclic hy- 
pergraph Ti .2 associated to the CSP instance in Example 9. Each node p in the 
tree is labeled by a set of hyperedges representing A(p); x{p) is the set of all 
variables, distinct from appearing in these hyperedges. Thus, the anonymous 
variable replaces the variables in var{X{p)) — x{p)- 

Using this graphical representation, we can easily observe an important fea- 
ture of hypertree decompositions. Once an hyperedge has been covered by some 
vertex of the decomposition tree, any subset of its variables can be used freely 
in order to decompose the remaining cycles in the hypergraph. For instance, 
the variables in the hyperedge corresponding to constraint j in H .2 are jointly 
included only in the root of the decomposition. If we were forced to take all 
the variables in every vertex where j occurs, it would not be possible to find a 
decomposition of width 2. Indeed, in this case, any choice of two hyperedges per 
vertex yields a hypertree which violates the connectedness condition for variables 
(i.e.. Condition 2 of Definition 11). 
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Figure 1(b) shows a complete hypertree decomposition of width 2 of the 
hypergraph in part (a) of the figure. Note that this decomposition also 
happens to be a query decomposition of width 2. 

Let fc be a fixed positive integer. We say that a CSP instance / has fc-bounded 
hypertree width if hw{Hi) < k, where Hi is the hypergraph associated to I. 

6 Computing Hypertree Decompositions 

Let H he a hypergraph, and let V C var{H) be a set of variables and X,Y G 
var{H). Then X is [P]-adjacent to Y if there exists an edge h G edgesiTL) such 
that {X, Y} C h—V. A [P]-path tt from to F is a sequence X = Xq, . . . ,Xi = Y 
of variables such that Xi is [P]-adjacent to Xi+i, for each i G [0...£-l]. A set 
W C var{H) of variables is [C]-connected if, for all X,Y G W, there is a 
[C]-path from XtoY.X [V]-component is a maximal [P]-connected non-empty 
set of variables W C var{H) — V. For any [l/]-component C, let edges{C) = 
{h G edgesiH) \ hnC 0}. 

Let HD = (r, X, A) be a hypertree for H. For any vertex v of T, we 
will often use u as a synonym of x(u). In particular, [v]-component denotes 
[x{v)]-component, the term [u]-path is a synonym of [x(u)]-path; and so on. We 
introduce a normal form for hypertree decompositions. 



ALTERNATING ALGORITHM ifc-decomp 
Input: A non-empty Hypergraph H. 

Result: “Accept”, if has fc-bounded hypertree- width; “Reject”, otherwise. 

Procedure k-decomposable(CR: SetOf Variables, R: SetOfHyperedges) 

begin 

1) Guess a set S C edges(H) of k elements at most; 

2) Gheck that all the following conditions hold: 

2. a) VP G edges{CR), (var(P) rivar(R)) C var(S) and 
2.b) var{S)nCR / 0 

3) If the check above fails Then Halt and Reject; Else 

Let C := {C C var(H) \ C is a fvar{S)]- component and C C Cr}\ 

4) If, for each C G C, k-decomposable{C, S) 

Then Accept 
Else Reject 

end; 

begin MAIN *) 

Accept if k-decomposable(var(H), 0) 

end. 



Fig. 4. A non-deterministic algorithm deciding fc-bounded hypertree-width 
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Definition 13 ([19]). A hypertree decomposition HD = (T, y, A) of a hyper- 
graph Ti. is in normal form (NF) if, for each vertex r G verticesiT), and for each 
child s of r, all the following conditions hold: 

1. there is (exactly) one [r\-component Cr such that x(Ts) = U (x(s) ny(r)); 

2. x(s) n Cr yf 0, where Cr is the [r]-component satisfying Condition 1; 

3. var(X(s)) n x(r) C x(s). 

Intuitively, each subtree rooted at a child node s of some node r of a normal 
form decomposition tree serves to decompose precisely one [r]-component. 

Theorem 14 ([19]). For each k-width hypertree decomposition of a hypergraph 
7i there exists a k-width hypertree decomposition ofFL in normal form. 

This normal form theorem immediately entails that, for each optimal hy- 
pertree decomposition of a hypergraph H, there exists an optimal hypertree 
decomposition of Ft in normal form. 

Importantly, NF hypertree decompositions can be efficiently computed. Fig- 
ure 4 shows the algorithm k-decomp, deciding whether a given hypergraph H has 
a fc-bounded hypertree-width decomposition, fc-decomp can be implemented 
on a logspace ATM having polynomially bounded tree-size, and therefore entails 
LOGCFL membership of deciding fc-bounded hypertree- width. 

Theorem 15 ([19]). Deciding whether a hypergraph FL has k-bounded 
hypertree-width is in LOGCFL. 

From an accepting computation of the algorithm of Figure 4 we can efficiently 
extract a NF hypertree decomposition. Since an accepting computation tree of a 
bounded-treesize logspace ATM can be computed in (the functional version of) 
LOGCFL [18], we obtain the following. 

Theorem 16 ([19]). Computing a k-bounded hypertree decomposition (if any) 
of a hypergraph FL is in i.e., in functional LOGCFL. 

As for sequential algorithms, a polynomial time algorithm opt-/c-decomp 
which, for a fixed k, decides whether a hypergraph has fc-bounded hypertree 
width and, in this case, computes an optimal hypertree decomposition in normal 
form is described in [20]. As for many other decomposition methods, the running 
time of this algorithm to find the hypergraph decomposition is exponential in 
the parameter fc. More precisely, opt-fc-decomp runs in time, where m 

and V are the number of edges and the number of vertices of FL, respectively. 

7 Solving CSP Instances of Bonnded Hypertree Width 

Figure 5 outlines an efficient method to solve CSP instances of bounded Hy- 
pertree Width. The key point is that any CSP instance I having fc-bounded 
hypertree width can be efficiently transformed into an equivalent acyclic CSP 
instance (Step 4.), which is then evaluated by the well-known techniques defined 
for acyclic CSPs (see Section 3). Let HW[fc] be the set of all finite relational 
structures whose associated hypergraph has hypertree width at most fc. 
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ALGORITHM 

Input: A k-bounded hypertree width CSP instance I. 

Result: A solution to 1, if I is satisfiable; “No”, otherwise. 

begin 

1) Build the hypergraph TLi of I. 

2) Compute a fc-width hypertree decomposition HD of 7ii in normal form. 

3) Compute from HD a complete hypertree decomposition HD' = {T, x, A) of Hi- 

4) Compute from HD' and I an acyclic instance I* equivalent to I. 

5) Evaluate 1* employing any efficient technique for solving acyclic CSPs. 

6) If I* is satishable, then return a solution to I*; 

Else Return “No” 

end. 



Fig. 5. An algorithm solving CSP instances of fc-bounded hypertree-width 

Theorem 17 ([21]). Given a CSP instance I G CSP(HW[A:]) and a k-width 
hypertree decomposition of Hi in normal form, I is solvable in log ||/||) 

time. 

We have also determined the precise computational complexity of solving 
CSP instances having bounded hypertree- width. 

Theorem 18 ([19]). CSP(HW[fc]) is LOGCFL -complete. 

8 Game Theoretic Characterization of Hypertree Width 

In [36], graphs G of treewidth k are characterized by the so called Robber-and- 
Cops game where k+1 cops have a winning strategy for capturing a robber on G. 
Cops can control vertices of a graph and can jump at each move to arbitrary 
vertices. The robber can move (at infinite speed) along paths of G but cannot go 
over vertices controlled by a cop. It is, moreover, shown that a winning strategy 
for the cops exists, iff the cops can capture the robber in a monotonic way, 
i.e., never returning to a vertex that a cop has previously vacated, which implies 
that the moving area of the robber is monotonically shrinking. For more detailed 
descriptions of the game, see [36] or [23]. 

In order to provide a similarity natural characterization for hypertree- width, 
we defined in [23] a new game, the Robber and Marshals game (R&Ms game). 
A marshal is more powerful than a cop. While a cop can control a single vertex 
(=variable) only, a marshal controls an entire hyperedge. In the R&Ms game, the 
robber moves on vertices just as in the robber and cops game, but now marshals 
instead of cops are chasing her. During a move of the marshals from the set of 
hyperedges E to to the set of hyperedges E' , the robber cannot pass through the 
vertices in B = (UF) n (UF'), where, for a set of hyperedges E, UF denotes the 
union of all hyperedges in E . Intuitively, the vertices in B are those not released 
by the marshals during the move. As in the monotonic robber and cops game, 
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it is required that the marshals capture the robber by monotonically shrinking 
the moving space of the robber. The game is won by the marshals if they corner 
the robber somewhere in the hyper graph. 

Example 19. Let us play the robber-and-marshals game on the hypergraph 
of query Qi of Example 4 (see Fig 1). We can easily recognize that two marshals 
can always capture the robber and win the game by using the following strategy: 
Independently of the initial position of the robber, the two marshals initially 
move on edges {a, 6}, and thus control the vertices (=variables) T, X, S, R, P, 
Y, U, as shown in Figure 6. A. After this move of marshals, the robber may be 
in V, in Z or in W. If the robber is on V, then the marshals move on edge /, 
and capture the robber, as shown in Figure 6.B (note that the robber cannot 
escape from V during this move, as both P and R - the only possible ways to 
leave V - are kept under the marshals’ control during the move). Otherwise, if 
the robber is on W or on Z, then the marshals move on {g, c} (see Figure 6.C). 
Since they keep the control of X, Y, T, and U during the move, then the robber 
can escape only to vertex W. Therefore, a further move on edge d allows the 
marshals to eventually capture the robber, as shown in Figure 6.D. 





(B) 



(C) 



(D) 





Fig. 6. (A) The first move of the marshals playing the game on ; (B) move 
of the marshals if the robber stands on V (capture position); (C) move of the 
marshals if the robber stands onW or on Z; (D) the marshals capture the robber 
in W 



In [23] we prove that there is a one-to-one correspondence between winning 
strategies for k marshals and hypertree decompositions of width at most A: in a 
certain normal form. 

Theorem 20 ([23]). A hypergraph hi has k -bounded hypertree width if and 
only k marshals have a winning strategy for the R&Ms game played on Ti. 

9 Logical Characterization of Hypertree Width 

Denote by L the fragment of first-order logic (FO) whose connectives are re- 
stricted to existential quantification and conjunction (i.e., V, and V are dis- 
allowed). Kolaitis and Vardi [30] proved that the class of all queries having 
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treewidth < k coincides in expressive power with the fc-variable fragment of 
L, i.e., the class of all L formulas that use k variables only. Se also [13]. 

In [23], we characterize HW[fc] in terms of a guarded logic. We show that 
HW[fc] = GFfe(L), where GFfe(L) denotes the fc-guarded fragment of L. The 
1-guarded fragment coincides with the classical notion of guardedness, where 
existentially quantified subformulas ip of a, formula are always conjoined with a 
guard, i.e., an atom containing all free variables of ip. In the fc-guarded fragment, 
up to fc atoms may jointly act as a guard (for precise definitions, see [23]). For 
the particular case fc = 1, this gives us a new characterization of the acyclic 
queries stating that the acyclic queries are precisely those expressible in the 
guarded fragment of L. In order to prove these results, we played the robber and 
marshals game on the appropriate query hyper graphs. 



10 Comparison of Hypertree Width with Other Methods 

We report about results comparing the Hypertree decomposition method with 
other methods for solving efficiently GSPs and conjunctive queries, which are 
based only on the structure of the hypergraph associated with the problem (we 
consider tractability due to restricted structure, as discussed in Section 2). We 
call these methods decomposition methods (DM), because each one provides a 
decomposition which transforms any hypergraph to an acyclic hypergraph. For 
each decomposition method D, this transformation depends on a parameter 
called D-width. Let fc be a fixed constant. The tractability class C{D,k) is the 
(possibly infinite) set of hypergraphs having H-width < k. D ensures that ev- 
ery GQ or GSP instance whose associated hypergraph belongs to this class is 
polynomial-time solvable. 

The main decomposition methods considered in database theory and in ar- 
tificial intelligence are: Treewidth [34] (see also [30,17]), Cycle Cutset [9], Tree 
Clustering [10], Induced Width (w* ) cf. [9], Hinge Decomposition [27,26], Hinge 
Decomposition + Tree Clustering [26], Cycle Hypercutset [21], Hypertree Decom- 
position. All methods are briefly explained in [21]. Here, we do not consider 
the notion of query width, because deciding whether a hypergraph has bounded 
query width is NP-complete. However, recall that this notion is generalized by 
hypertree width, in that whenever a hypergraph has query width at most fc, it 
has hypertree width at most fc, too. The converse does not hold, in general [19]. 
For comparing decomposition methods we introduce the relations ^, >, and 
defined as follows: 

Di D 2 , in words, D 2 generalizes D\, if 3^ > 0 such that, Vfc > 0, 
C{Di, fc) C C{D 2 , k S). Thus D\ A D 2 if every class of GSP instances which 
is tractable according to Di is also tractable according to D 2 . 

Di > D 2 {Di beats D 2 ) if there exists an integer fc such that Vm C{Di, fc) % 
C{D 2 ,m). To prove that DiX>D 2 , it is sufficient to exhibit a class of hypergraphs 
contained in some C{Di,k) but in no C{D 2 ,j) for j > 0. Intuitively, Di [> D 2 
means that at least on some class of GSP instances, Di outperforms D 2 . 
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Di D 2 if Di ^ D 2 and D 2 > Di. In this case we say that D 2 strongly 
generalizes Di. 

Mathematically, ^ is a preorder, i.e., it is reflexive, transitive but not antisym- 
metric. We say that Di is ^-equivalent to D 2 , denoted Di = D 2 , if both Di ^ D 2 
and D 2 di Di hold. 

The decomposition methods Di and D 2 are strongly incomparable if 
both Di \> D 2 and D 2 t> Di. Note that if Di and D 2 are strongly incompa- 
rable, then they are also incomparable w.r.t. the relations d and 

Figure 7 shows a representation of the hierarchy of DMs determined by the 
^ relation. Each element of the hierarchy represents a DM, apart ^from that 
containing the three ^-equivalent methods Tree Clustering, Treewidth, and w*. 

Theorem 21 ([21]). For each pair Di and D 2 of decompositions methods rep- 
resented in Figure 7, the following holds. There is a directed path from D\ to D 2 
'i'ff D 2 , i.e., iff D 2 strongly generalizes D\. Moreover, D\ and D 2 are 

not linked by any directed path iff they are strongly incomparable. Flence, Fig. 7 
completely describes the relationships among the different methods. 




Fig. 7. Constraint tractability hierarchy 



Recently, a comparison between hypertree width and Courcelle’s concept of 
clique-width [7,8] was made [24]. Given that clique- width is defined for graphs, 
it had to be suitably adapted to hypergraphs. Defining the clique-width of a 
hypergraph TL as the cliquewidth of its primal graph makes no sense in the con- 
text of CSP-tractability, because then CSPs of bounded clique-width would be 
intractable. Therefore, in [24], the clique-width TC is defined as the clique-width 
of its incidence graph G\{Tl). With this definition it could be shown in [24] that 
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(a) CSP’s whose hypergraphs have bounded clique-width are tractable, and (b) 
bounded hypertree width strongly generalizes bounded clique-width. 



11 Open Problems 

Several questions are left for future research. In particular, it would be inter- 
esting to know whether the method of hypertree decompositions can be further 
generalized. For instance, let us define the concept of generalized hypertree de- 
composition by just dropping condition 4 from the definition of hypertree de- 
composition (Def. 11). Correspondingly, we can introduce the concept of gen- 
eralized hypertree width ghw{TL) of a hypergraph 7i. We know that all classes 
of Boolean queries having bounded ghw can be answered in polynomial time. 
But we currently do not know whether these classes of queries are polynomially 
recognizable. This recognition problem is related to the mysterious hypergraph 
sandwich problem [31], which has remained unsolved for a long time. If the latter 
is polynomially solvable, then also queries of bounded ghw are polynomially rec- 
ognizable. Another question concerns the time complexity of recognizing queries 
of bounded hypertree width. Is this problem fixed-parameter tractable such as 
the recognition of graphs of bounded treewidth? 
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Abstract. We study the expressive power non-size increasing recursive 
definitions over lists. This notion of computation is such that the size of 
all intermediate results will automatically be bounded by the size of the 
input so that the interpretation in a finite model is sound with respect to 
the standard semantics. Many well-known algorithms with this property 
such as the usual sorting algorithms are definable in the system in the 
natural way. The main result is that a characteristic function is definable 
if and only if it is computable in time 0(2^^"^) for some polynomial p. 
The method used to establish the lower bound on the expressive power 
also shows that the complexity becomes polynomial time if we allow 
primitive recursion only. This settles an open question posed in [1,6]. 
The key tool for establishing upper bounds on the complexity of deriv- 
able functions is an interpretation in a finite relational model whose 
correctness with respect to the standard interpretation is shown using a 
semantic technique. 
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Consider the following recursive definition of a function on lists: 

twice(nil) = nil , , 

twice(cons(a:, /)) = cons(tt, cons(tt, twice(/))) ' ' 

Here nil denotes the empty list, cons(x,^) denotes the list with first element x 

and remaining elements 1. tt,ff are the members of a type T of truth values. 
We have that twice(Z) is a list of length 2 • \l\ where |/| is the length of 1. Now 
consider 

exp(nil) = cons(tt, nil) , , 

exp(cons(x, /)) = twice(exp(Z)) ' 

J. Sgall, A. Pultr, and P. Kolman (Eds.): MFCS 2001, LNCS 2136, pp. 58—61, 2001. 
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We have |exp(Z)| = and further iteration leads to elementary growth rates. 

This shows that innocuous looking recursive definitions can lead to enor- 
mous growth. In order to prevent this from happening it has been suggested 
in [?,10] to rule out definitions like (2) above, where a recursively defined func- 
tion, here twice, is applied to the result of a recursive call. Indeed, it has been 
shown that such discipline restricts the definable functions to the polynomial- 
time computable ones and moreover every polynomial-time computable function 
admits a definition in this style. 

Many naturally occurring algorithms, however, do not fit this scheme. Con- 
sider, for instance, the definition of insertion sort: 

insert(a:, nil) = cons(a;, nil) 

insert(a:, cons(?/, 1)) = x < y then cons(a:, cons(y, 1)) else cons(y, insert(a;, 1)) 
sort(nil) = nil 

sort(cons(x, Z)) = insert(x, sort(/)) 

(3) 

Here just as in (2) above we apply a recursively defined function (insert) to 
the result of a recursive call (sort), yet no exponential growth arises. 

It has been argued in [3] and [6] that the culprit is definition (1) because 
it defines a function that increases the size of its argument and that non size- 
increasing functions can be arbitrarily iterated without leading to exponential 
growth. 

In [3] a number of partly semantic criteria were offered which allow one to 
recognise when a function definition is non size-increasing. In [6] we have given 
syntactic criteria based on linearity (bound variables are used at most once) and 
a so-called resource type O which counts constructor symbols such as “cons” on 
the left hand side of an equation. 

This means that cons becomes a ternary function taking one argument of 
type O, one argument of some type A (the head) and a third argument of type 
L(H), the tail. There being no closed terms of type O the only way to apply cons 
is within a recursive definition; for instance, we can write 

append(nil,Z2) = Z2 

append(cons(d, a, Zi), Z 2 ) = cons(d, a, append(/i, ^ 2 ) ^ 

Alternatively, we may write 

append(Zi, Z2) = match Z with nil=>Z2 | cons(cZ, a, Z)^)^cons(cZ, append(Zi, Z2) (5) 

We notice that the following attempted definition of twice is illegal as it violates 
linearity (the bound variable d is used twice): 

twice(nil) = nil , , 

twice(cons(cZ, X, Z)) = cons(d, tt, cons(cZ, tt, twice(Z))) ' ' 

The definition of insert, on the other hand, is in harmony with linearity pro- 
vided that insert gets an extra argument of type O and, moreover, we assume 
that the inequality test returns its arguments for subsequent use. 
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The main result of [6] and [1] was that all functions thus definable by struc- 
tural recursion are polynomial-time computable even when higher-order func- 
tions are allowed. In [7] it has been shown that general-recursive first-order 
definitions admit a translation into a fragment of the programming language 
C without dynamic memory allocation (“malloc”) which on the one hand al- 
lows one to automatically construct imperative implementations of algorithms 
on lists which do not require extra space or garbage collection. More precisely, 
this translation maps the resource type O to the C-type void * of pointers. The 
cons function is translated into the C-function which extends a list by a given 
value using a provided piece of memory. It is proved that the pointers arising 
as denotation of terms of type O always point to free memory space which can 
thus be safely overwritten. 

This translation also demonstrates that all definable functions are com- 
putable on a Turing machine with linearly bounded work tape and an unbounded 
stack (to accommodate general recursion) which by a result of Cook^ [4] equals 
the complexity class . It was also shown in [7] that any such func- 

tion admits a representation. 

In the presence of higher-order functions the translation into C breaks down 
as C does not have higher-order functions. Of course, higher-order functions can 
be simulated as closures, but this then requires arbitrary amounts of space as 
closures can grow proportionally to the runtime. In a system based on structural 
recursion such as [6] this is not a problem as the runtime is polynomially bounded 
there. The hitherto open question of complexity of general recursion with higher- 
order functions is settled in this work [8] and shown to require a polynomial 
amount of space only in spite of the unbounded runtime. 

We thus demonstrate that a function is representable with general recur- 
sion and higher-order functions iff it is computable in polynomial space and an 
unbounded stack or equivalently (by Cook’s result) in time 0(2^^"'^) for some 
polynomial p. The lower bound of this result also demonstrates that indeed all 
characteristic functions of problems in P are definable in the structural recursive 
system. This settles a question left open in [1,6]. 

In view of the results presented in the talk (see also [8]), these systems of 
non size-increasing computation thus provide a very natural connection between 
complexity theory and functional programming. There is also a connection to 
finite model theory in that programs admit a sound interpretation in a finite 
model. This improves upon earlier combinations of finite model theory with 
functional programming [5] where interpretation in a finite model was achieved 
in a brute-force way by changing the meaning of constructor symbols, e.g. suc- 
cessor of the largest number N was defined to be N itself. In those systems it 
is the responsibility of the programmer to account for the possibility of cut-off 
when reasoning about the correctness of programs. In the systems studied here 
linearity and the presence of the resource types automatically ensure that cutoff 

^ This result asserts that if L{n) > log(n) then DT1ME(2^^^^”^^) equals the class of 
functions computable by a Turing machine with an L(n)-bounded R/W-tape and 
an unbounded stack. 
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never takes place. Formally, it is shown that the standard semantics in an infinite 
model agrees with the interpretation in a certain finite model for all well-formed 
programs. 

Another piece of related work is Jones’ [9] where the expressive power of 
cons-free higher-order programs is studied. It is shown there that first-order 
cons-free programs define polynomial time , whereas second-order programs de- 
fine EXPTIME. This shows that the presence of “cons”, tamed by linearity and 
the resource type changes the complexity-theoretic strength. While loc. cit. also 
involves Cook’s abovementioned result (indeed, this result was brought to the 
author’s attention by Neil Jones) the other parts of the proof are quite differ- 
ent. 
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Abstract. We discuss some of the recent progress in quantum algorith- 
mics. We review most of the primary techniques used in proving upper 
and lower bounds and illustrate how to apply the techniques to a vari- 
ety of problems, including the threshold function, parity, searching and 
sorting. We also give a set of open questions and possible future research 
directions. Our aim is to give a basic overview and we include suggestions 
to further reading. 



1 Introduction 

The most famous quantum algorithms are Shor’s [38] algorithms for integer 
factorization and computing discrete logarithms and Grover’s [28] algorithm for 
searching an un-ordered set of elements. In this paper, we discuss and survey 
many of the quantum algorithms found since the discoveries of Shor and Grover. 

The algorithms of Shor and Grover, as well as most other existing quantum 
algorithms, can be naturally expressed in the so-called black box model. In this 
model, we are given some function / as input. The function / is given as a 
black box so that the only knowledge we can gain about / is in asking for its 
value on points of its domain. We may think of / as an oracle which, when asked 
some question x, replies by f(x). Our measure of complexity is the number of 
evaluations of / required to solve the problem of interest. 

On a quantum computer, the only two types of operations allowed are unitary 
operators and measurements. We may without loss of generality assume that all 
measurements are performed at the end of the computation, and thus any quan- 
tum algorithm can be modeled as a sequence of unitary operators followed by a 
single measurement. Gonsequently, we need to model our queries to function / 
so that they are unitary. Let / : A — > Z be any function with Z = {0, 1}™ for 
some integer m. Define the unitary operator 0/ by 

|a;)|z)|n;) i — > \x)\z ® f{x))\w) (1) 

for all X G X, z £ Z and w G Z, where z © f{x) denotes the binary exclusive-or 
of bit-strings z and f{x). Applying the operator Of twice is equivalent to ap- 
plying the identity operator and thus 0/ is unitary (and reversible) as required. 

* Supported in part by Canada’s NSERC and the Pacific Institute for the Mathema- 
tical Sciences. 
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It changes the content of the second register (|z)) conditioned on the value of 
the first register (|x)). The purpose of the third register (|w)) is simply to allow 
for extra work-space. We often refer to Oy as the black box or as the oracle. On a 
quantum computer, we are given access to operator 0/ and our objective is to 
use the fewest possible applications of 0/ to solve the problem at hand. This 
model of computation is often refered to as the quantum black box model, and 
sometimes also to as quantum decision trees [16]. 

A quantum algorithm Q that uses T applications of 0/ (i.e., uses T queries) 
is a unitary operator of the form [16,4,31] 

Q=(U0/)"’U (2) 

for some unitary operator U. We always apply algorithm Q on the initial state jO). 
The superposition obtained by applying Q on |0) is Q]0), which is (UO/)^U]0). 
The operators are applied right to left: first apply U, and then iterate UOf a 
total number of T times. After applying Q on |0), we always measure the final 
state in the computational basis. The outcome of the measurement are classical 
bits, and the rightmost bit(s) of those is the output of the algorithm. 

Prior to the discoveries of Shor and Grover, notable quantum algorithms were 
found by Deutsch [20], Deutsch and Jozsa [21], Bernstein and Vazirani [6] and 
Simon [39]. Deutsch [20] considered the problem that we are given a function 
/ : {0,1} — *■ {0,1} as a black box 0/, and we want to determine whether 
/(O) = /(I) or /(O) yf /(I). Deutsch gave a zero-error quantum algorithm that 
uses only 1 application of Oy and outputs the correct answer with probability 
and the answer “inconclusive” with complementary probability 

The algorithm of Deutsch [20] is generalized and improved by Deutsch and 
Jozsa [21], Cleve, Ekert, Macchiavello and Mosca [18], Tapp [40] and others. 
Deutsch’s algorithm, as well as the early algorithms by Bernstein and Vazirani [6] 
and Simon [39], are discussed in many introductions to quantum computing, 
including the excellent papers by Berthiaume [7], Cleve [17] and Reiffel and 
Polak [37]. For a thorough introduction to quantum computing, see the recent 
book by Nielsen and Chuang [34] . 

The two most widely used tools for constructing quantum algorithms are 
Fourier transforms and amplitude amplification. Fourier transforms is an intrin- 
sic ingredient in the algorithms of Bernstein and Vazirani [6], Simon [39] and 
Shor [38]. See for instance Ivanyos, Magniez and Santha [32] and the references 
therein for applications of Fourier transforms in quantum algorithms. Amplitude 
amplification was introduced by Brassard and Hpyer [9] as a generalization of the 
underlying subroutine used in Grover’s algorithm [28]. We discuss this technique 
further in Sect. 3 as most of the newer quantum algorithms utilize amplitude 
amplification. 

In the rest of this paper, we review some of the main techniques used in 
proving upper and lower bounds on quantum algorithms. For the purpose of 
illustrating the techniques, we also discuss nine concrete problems, each of them 
hopefully being considered fundamental. Section 2 contains an overview of the 
nine problems, including the best known upper and lower bounds. We discuss 
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techniques for proving upper bounds in Sect. 3, and lower bounds in Sect. 4. 
We conclude in Sect. 5 by mentioning some open questions and possible future 
research directions. 

2 Overview of Problems 

Below, we give a list of nine problems that have been considered in the quantum 
black box model. The problems fall in 3 groups. The first group consists of 
decision problems with solutions depending only on |/“^(1)|, the number of 
elements being mapped to 1. Let [N] = {0,1, ...,7V — 1}. We say a problem P 
is symmetric if P(/) = P(/ o tt) for all inputs / : [TV] — > Z and all permutations 
7T : [TV] — > [TV]. All problems in the first group are symmetric. Generally speaking, 
the quantum complexities of symmetric problems are much more well-understood 
than those of non-symmetric problems. In the second group, we have put the 
fundamental computational tasks of searching and sorting, and finally, in the 
third group, some decision problems related to sorting. Table 1 gives the best 
known lower and upper bounds for each of these nine problems. 

Or Given Boolean function / : [TV] — > (0, 1}, is |/“^(1)| > 0? 

Thresholds Given Boolean function / : [TV] ^ (0, 1}, is |/“^(1)| > SI 
Majority Given Boolean function / : [TV] — > {0,1}, is |/“^(1)| > TV/2? 
Parity Given Boolean function / : [TV] — > {0,1}, is |/“^(1)| odd? 

Ordered searching Given monotone Boolean function / : [TV] ^ {0, 1} 
and promised that /(TV — 1) = 1, output the smallest index x G [TV] so that 
fix) = 1. 

Sorting Given function / : [TV] — > Z, output a permutation tt : [TV] — > [TV] so 
that / o 7T is monotone. 

Collision Given function / : [TV] ^ Z and promised that / is either 1-to-l 
or 2-to-l, decide which is the case. 

Element Distinctness Given a function / : [TV] ^ Z, do there exist distinct 
elements x,y G [TV] so that f{x) = /(i/)? 

Claw (monotone case) Given two monotone functions /, g : [TV] — > Z, does 
there exist (x,y) G [TV]^ so that f{x) = g{y)^ 



3 The Quantum Algorithms 

For the five problems Or, Thresholds, Collision, Element Distinctness 
and Claw, the quantum complexities listed in Table 1 are asymptotically smaller 
than the corresponding randomized decision tree complexities. In each case, the 
speed-up is at most quadratic and is achieved by applying amplitude amplifica- 
tion and estimation. Many newer quantum algorithms rely on these two tech- 
niques, and we therefore now give a brief description of them and sketch how 
to apply them to each of these five problems. For more details on amplitude 
amplification and estimation, see Brassard, Hpyer, Mosca and Tapp [10]. 
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Table 1. The best known lower and upper bounds in the quantum black box 
model for each of the nine problems defined in Sect. 2. These are the asymptotic 
bounds for two-sided bounded-error quantum algorithms, with the exception of 
the bounds for Ordered Searching which are for exact quantum algorithms. 
The two rightmost columns contain references to the proofs of each non-trivial 
lower and upper bound, respectively. We assume the input to each of the four last 
problems are given as comparison matrices. For the threshold problem, S~^ = 
S'-!- 1 and S~ = S' — 1. We use log* to denote the log-star function defined below, 
and c is some (small) constant 



Problem 


Lower 


Upper 


References 


Or 


y/N 


y/N 


[5] [8,28] 


Thresholds 


^S+{N-S~) 


^S+(N-S-) 


[4] [4] 


Majority 


N 


N 


[4] 


Parity 


N 


N 


[23,4] 


Ordered Searching 


0.220 logAN) 


0.526 logj (AT) 


[31] [25] 


Sorting 


Nlog{N) 


Nlog{N) 


[31] 


Collision 




log{N) 


[11] 


Elem. Distinctness 


VNlog{N) 


log(iV) 


[31] [14] 


Claw (monotone case) 


y/N 




[14] 



Consider we want to solve some problem using some quantum algorithm A. 
The algorithm A starts on the initial state |0) and produces some final su- 
perposition \<F) = assume ^ is a quantum algorithm that uses 

no measurement during the computation. This assumption is mostly technical 
and can be safely ignored for the purposes of this paper, so we do not elabo- 
rate any further on this. What is crucial, is that we from the output of algo- 
rithm A somehow can deduce if we have solved the problem or not . We formalize 
this by assuming that we, in addition to A, are given some Boolean function 
X : Z ^ {0, 1}. We say algorithm A succeeds if a measurement of |<F) yields an 
integer i so that x(f) = 1. Let a denote the success probability of A, that is, let 

There are (at least) two types of questions one may consider concerning 
algorithm A with respect to function x- Firstly, we may consider the problem of 
finding an integer i so that y(z) = 1, i.e., finding a solution i, and secondly, we 
may ask for the value of a, i.e., what is the success probability of A1 These two 
questions concern searching and estimation, respectively. 
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Amplitude amplification [10] is a technique that allows fast searching on a 
quantum computer. On a classical computer, the standard technique to boosting 
the probability of success is by repetition. By running algorithm A a total number 
of j times, the probability of success increases to roughly ja (assuming ja 1). 
Intuitively, we can think of this strategy as each additional run of algorithm A 
boosting the probability of success by an additive amount of roughly a. To find 
an integer i with y(i) = 1, we require an expected number of 0{^) repeti- 
tions. A quantum analogue of boosting the probability of success is to boost 
the amplitude of being in a certain subspace of the Hilbert space. Amplitude 
amplification is such a technique. It allows us to boost the probability of success 
to roughly using only j applications of algorithm A and function %, which 
implies that we can find an integer i with x(j-) = 1 supplying A and x an expected 
number of only 0(-^) times. 

Theorem 1 (Amplitude amplification) [10]. Let A be any quantum algo- 
rithm that uses no measurements, and /ef x : Z — > {0, 1} be any Boolean func- 
tion. Let AjO) = superposition obtained by running A on the 

initial state |0). Let a = \ denote the probability that a measure- 

ment of the final state AjO) yields an integer i so that x(*) = 1. Provided a > 0 
we can find an integer i with x(i) = 1 using an expected number of only O(^) 
applications of A, the inverse A~^ and function x- 

This theorem is a generalization of Grover’s result [28] that a database can be 
searched for a unique element in 0{s/N) queries. The quantum algorithm [8] for 
Or can be phrased in these terms: Let A be any quantum algorithm that maps 
the initial state jO) to an equally weighted superposition of all possible inputs 
to function /, A]0) = I*)- measure state A]0), our probability 

of seeing an integer i so that f{i) = 1 is exactly a = ]/“^(l)l/A, since every i is 
equally likely to be measured. Let X = /• Thus using an expected number of only 
0{^) = 0{^/WJ\f~A^)\) applications of function / we can find an integer i 
so that f{i) = 1, provided there is such an i. This implies that there is a one- 
sided error quantum algorithm for Or that, using only 0{-\fN) applications of /, 
outputs “no” with certainty if |/“^(1)| = 0, and “yes” with probability at least | 
if |/-i(l)| >0. 

The quantum algorithm [11] for Collision also uses amplitude amplification. 
First, pick any subset B of [N] of cardinality and sort B with respect to 

its /-values using log{N)) comparisons of the form “Is f{i) < f{j)T\ 

Once B is sorted, check that no two consecutive elements in the sorted list 
map to the same value under /, using an additional — 1 comparisons. If a 
collision is found, output “2-to-l” and stop. Otherwise, proceed as follows. Define 
X : [N] {0, 1} by x(*) = 1 if and only iii ^ B and f{i) = f{j) for some j S B. 

A single evaluation of x can be implemented using only 0(log(A)) comparisons 
of /-values since B is sorted. If function / is 2-to-l then ]x~^(l)l = and 

if / is 1-to-l then |x~^(l)l = 0. As in the case of Or, there thus is a one- 
sided error quantum subroutine A that, using only 0(yvV/iVT^) applications 
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of Xi outputs “no” with certainty if |x~^(l)l = Oj and “yes” with probability at 
least I if |x~^(l)l = If subroutine A outputs “yes”, then output “2-to-l” 

and stop, otherwise output “1-to-l” and stop. The total number of comparisons 
of /-values is log(A^)) -I- — l) -I- 0{\/ N/N^A x log(iV)), which is 

log(A^)) as specified in Table 1. 

The quantum algorithm [14] for Element Distinctness is similar to the 
algorithm for Collision, except that we now require two nested applications of 
amplitude amplification. Interestingly, the algorithm for Element Distinct- 
ness uses only log(A^)) comparisons, which is not only sublinear in N, 

but also much less than the number of comparisons required to sort on a quan- 
tum computer. The algorithm [14] for Claw uses comparisons 

for some (small) constant c with 0(log*(iV)) nested applications of amplitude 
amplification. The log-star function log*() is defined as the minimum number 
of iterated applications of the logarithm function necessary to obtain a num- 
ber less than or equal to 1: log*(M) = min{s > 0 | log^®^(M) < 1}, where 
log^®^ = logolog^®”^^ denotes the s*'' iterated application of log, and log^°^ is 
the identity function. 

Amplitude estimation [10] is a technique for estimating the success prob- 
ability a of a quantum algorithm A. This technique is used in the quantum 
algorithm [4] for Thresholds. On a classical computer, the standard technique 
to estimating the probability of success is, again, by repetition: If algorithm A 
succeeds in j out of k independent runs, we may output a = j /k as an approxi- 
mation to a. A quantum analogue of estimating the probability of success is to 
estimate the amount of amplitude of being in a certain subspace of the Hilbert 
space. The following theorem provides a method for doing this. 

Theorem 2 (Amplitude estimation) [10]. Let the setup be as in Theorem 1. 
There exists a quantum algorithm that given A, function x o,nd an integer M > 0, 
outputs a (0 < a < 1) such that 



jo, — aj < 27 t 



a(l — a) 7T^ 



M 



M2 



with probability at least The algorithm uses 0{M) applications of A, the 
inverse A~^ and function x- 

A straight-forward classical algorithm for Thresholds is as follows. Let 
A denote the algorithm that outputs a random element x € [A], taken uni- 
formly. The probability that A succeeds in outputting an x so that f{x) = 1 is 
exactly a = |/“^(1)|/A. Apply A a total number of k times, and let j de- 
note the number of times A succeeds. If [|: + > S, then output “yes”, 

otherwise output “no” . A simple quantum version of this algorithm is as fol- 
lows. First, we find an estimate a of a by applying the above theorem with 
M = 100\/S'+(A — S~), where S'^ = S + 1 and S~ = S — 1. Our estimate a 
satisfies that |a — a] < 27ry^a(l — a)/M + ■jp with probability at least -p, which 
implies that if |/“^(1)| > S then [aA -|- > S' with probability at least 
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and if 1/ ^(1)| < S' — 1 then [aiV + < S — 1 with probability at least 

Thus, if [dN + > S then output “yes” , otherwise output “no” . The number 

of evaluations of / used by this algorithm is 0{M), which is 0(i/ S+{N — S~)) 
as specified in Table 1. 

3.1 Other Quantum Algorithms 

The quantum complexities of the four symmetric decision problems in the first 
group in Table 1 are all tight, up to constant factors. The quantum complexities 
of symmetric decision problems are reasonably well-understood, especially since 
the seminal work of Beals, Buhrman, Cleve, Mosca and de Wolf [4]. We also 
have a fair understanding of the quantum complexities of some symmetric non- 
decision problems that are related to statistics. Four examples of such problems 
are: 

Counting Given Boolean function / : [TV] — > {0, 1}, compute |/“^(1)|. 
Minimum Given / : [TV] — > Z, find x € [TV] so that /(x) is minimum. 

Mean Given f : [N] ^ Z, compute J2xg[n] /(*)■ 

Median Given function / : [TV] — > Z, find x £ [TV] so that f{x) has rank [TV/2J 

in /([A^]). 

All of these problems have efficient quantum algorithms: Counting can be 
solved via amplitude estimation [10], Minimum by repeated applications of am- 
plitude amplifications [22], and Mean and Median by using both amplitude 
amplification and estimation [33,27,30]. 

Often, amplitude amplification and estimation are used in conjunction with 
techniques from “classical” computing: Novak [35] considers the quantum com- 
plexities of integration, Hayes, Kutin and van Melkebeek [29] Majority (see 
also Alonso, Reingold and Schott [1]), and Ramesh and Vinay [36] string match- 
ing. 

As mentioned in the introduction, Fourier transforms are also widely used in 
quantum algorithms, the most famous examples being in Shor’s algorithms [38] 
for factoring and finding discrete logarithms. We refer to [32] and the references 
therein for many more examples. The quantum algorithm for ordered searching 
by Farhi, Goldstone, Gutmann and Sipser [25] is one of the few remarkable exam- 
ples of quantum algorithms based on principles seemingly different from those 
found in amplitude amplification and Fourier transforms. A generic bounded- 
error quantum algorithm for solving any problem using at most A/2-1- '/N 
applications of / is given by van Dam in [19]. 

4 The Lower Bounds 

Much work has been done on proving lower bounds for the quantum black box 
model. If trying to categorizing the many approaches, we may consider there 
being two main methods, the first being by inner products, the second by degrees 
of polynomials. Very roughly speaking, so far the simplest and tightest lower 
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bounds are proven by the former method for bounded-error quantum algorithms, 
and by the latter for exact and small-error quantum algorithms. 

The primary idea of the latter method is that to any problem P, we can 
associate a polynomial and the degree of that polynomial yields a lower bound 
on the number of queries required by any quantum algorithm solving P. The 
method is introduced by Beals, Buhrman, Cleve, Mosca and de Wolf in [4], and 
also implicitly used by Fortnow and Rogers in [26]. A beautiful survey of this 
and related methods for deterministic and randomized decision trees is given by 
Buhrman and de Wolf in [16]. See also Buhrman, Cleve, de Wolf and Zalka [13] for 
bounds on the quantum complexity as a function of the required error probability 
via degrees of polynomials. 

The first general method for proving lower bounds for quantum computing 
was introduced by Bennett, Bernstein, Brassard and Vazirani in their influential 
paper [-5]. Their technique is nicely described in Vazirani ’s exposition [41], where 
it is refered to as a “hybrid argument” . Recently, Ambainis [3] introduced a very 
powerful lower bound technique based on entanglement considerations, which 
he refers to as “quantum arguments” . The techniques in [-5] and [3] share many 
properties, and we may view both as being based on inner product arguments. 
In the rest of this section, we discuss general properties of lower bound techniques 
based on inner products. 

Suppose we are given one out of two possible states. That is, suppose we are 
given state ['0) and promised that either \'tp) = \ijjo) or \iIj) = \iIji). We want to 
And out which is the case. We assume |'0o) and \tjji) are known states. Then we 
may ask, what is the best measurement we can perform on ['0) in attempting 
correctly guessing whether \tp) = |0o) or |0) = |'0i). The answer can be expressed 
in terms of the inner product (0o|'0i)- 

Lemma 1. Suppose we are given some state \tjj) and promised that either ['0) = 
|'0o) or 10) = |0i) for some known states |0o) and ]0i). Then, for allO < £ < ^, 
the following two statements are equivalent. 

1. There exists some measurement we can perform on |0) that produces one bit b 
of outcome so that if |0) = |0o) then 6=0 with probability at least 1 — £, 
and if |0) = |0i) then 6=1 with probability at least 1 — e. 

2. |(0o|0i)| < 2^e(l - e). 

Two states can be distinguished with certainty if and only if their inner product 
is zero, and they can be distinguished with high probability if and only if their 
inner product has small absolute value. 

With this, we now give the basic idea in the former lower bound method. 
Our presentation follows that of Hpyer, Neerbek and Shi [31]. Consider some 
decision problem P. Suppose Q = (UO)^U is some quantum algorithm that 
solves P with error probability at most e using T queries to the oracle. Let 
e' = 2yje{\ — e). Let Ra Q {f ■ [A^] ^ Z \ P(/) = 0} be any non-empty subset 
of the possible input functions on which the correct answer to problem P is 0. 
Similarly, let Ri {g '■ [IV] — > Z \ P{g) = 1} be any non-empty subset of the 
1-inputs. 
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Initially, the state of the computer is |0). After j iterations, the state is 
I'f'j) = (UO/)^U|0) if we are given oracle /, and it is = (UOg)-^U|0) if we 
are given oracle g. Suppose that f € Rq and g G Ri. Then we must have that 
< s' since the presumed algorithm outputs 0 with high probability 
on every input / in Rq, and it outputs 1 with high probability on every input g 
in i?i. We thus also have that \ J2feRo^geRi ^ e'l^oll-Ril- 

For each j G {0, 1, . . . ,T}, let 

E E (3) 

/ SRo gSRi 

denote the sum of the inner products after j iterations. We may think of Wj is the 
total “weight” after j iterations. Initially, the total weight Wq = |i?o| l-^i I is large, 
and after T iterations, the absolute value of the total weight \Wt\ < e'|i?o||.Ri| 
is small. The quantity \Wj — W}+i| is a measure of the “progress” achieved by 
the query. 

Theorem 3 ([3]). If A is an upper hound on \Wj — Wj+i\ for all 0 < j < T, 
then the algorithm requires at least (1 — e')^^ queries in eomputing problem P 
with error probability at most e. 

Proof. The initial weight is Wq, and by the above discussion, \ Wt\ < s'Wq where 
s' = — e). Write Wq — Wt = J2j=Q{Wj — W^+i) as a telescoping sum. 

Then \Wq — Wt\ < \^j ~ Wj+i \ "A T A, and the theorem follows. □ 

By Theorem 3, we may prove a lower bound on the quantum complexity of 
some problem P by proving an upper bound on max^ \ Wj — Wj+i| that holds 
for any quantum algorithm for P. 

Each of the lower bounds for the five problems Or, Thresholds, Majo- 
rity, Parity and Claw listed in Table 1 can be proven using this inner product 
method. For instance, for Or, let i?o = {/ : [N] {0,1} | |/“^(1)| = 0} be 

the singleton set consisting only of the function identical 0, and let = {/ : 
[N] — > {0,1} I |/“^(1)| = 1} consists of the N functions mapping a unique 
element to 1. Then Wq = |i?o||.Ri| = N. Using these sets, we can show [3,31] 
that \Wj — Wj+i| < 2'\/]V for all 0 < j < N, and thus we require at least 
(1 — £ Q{'/N) queries to the oracle to solve Or with error probability s. 

The lower bounds for the three problems Ordered Searching, Sorting 
and Element Distinctness listed in Table 1 can be proven using a general- 
ization proposed in [31]. Re-define the weight Wj to be a (possibly non-uniform) 
weighted sum of the inner products, 

Wj= Y. ( 4 ) 

f^Ro g^Ri 

where tu{f,g) > 0 for all oracles f G Rq and g G R\. Allowing non-uniform 
weights yields lower bounds that are a logarithmic factor better than the corre- 
sponding almost-trivial lower bounds [31]. 
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4.1 Other Lower Bounds 

Many more good lower bounds on the quantum black box complexities besides 
the ones we have mentioned so far are known. These include the following. Nayak 
and Wu [33] prove optimal lower bounds for Mean and Median by the polyno- 
mial method. Buhrman and de Wolf [15] give the first non-trivial lower bound 
for Ordered Searching by a reduction from Parity. Farhi, Goldstone, Gut- 
mann, and Sipser [23] improve this to log2(iV)/2 log 2 log 2 (iV), and Ambainis [2] 
shows the first l7(log(7V)) lower bound. Buhrman, Gleve and Wigderson [12] 
prove several lower bounds on quantum black box computing derived from their 
seminal work on quantum communication complexity. Zalka [42] shows that 
Grover’s original algorithm for finding a unique marked element in a database 
is optimal also when considering constant factors and low order terms. 

5 Conclusion and Open Problems 

In this paper, we have tried to present some of main ideas used in quantum 
algorithmics. The reader interested in studying quantum computing further may 
benefit from reading some of the many excellent introductions and reviews. Good 
starting points include [7,16,17,18,34,37,41], all of which can be found at the 
authors’ home pages or on the so-called e-print archive (to which we have given 
a pointer after the references). 

A primary unsolved question in quantum algorithmics is the quantum com- 
plexity of Collision. How many queries are necessary and sufficient for distin- 
guishing a 1-to-l function from a 2-to-l function? Basically all we know, is that 
it is easy to distinguish the set of 1-to-l functions from certain highly struc- 
tured subclasses of the 2-to-l functions [39,9]. Can this result be extended to for 
instance subclasses based somehow on pseudo-random number generators? 

We also find it interesting to compare the quantum black box model with 
other models. Let P be any of the nine problems listed in Table 1, and let T b 

e the quantum complexity of any known quantum algorithm for P. Then the 
randomized decision tree complexity for P is known to be in O(T^). We are thus 
naturally lead to ask for what problems the randomized decision tree complexity 
is always at most quadratic in the quantum complexity? A related question is 
to consider time-space tradeoffs — see the conclusions in [14,31]. 

A challenging and interesting quest is finding new quantum algorithms. Many 
existing quantum algorithms seem to benefit from symmetries, periodicities, re- 
peated patterns, etc. in the problems under consideration. Maybe other problems 
that also contain such properties can be solved efficiently on a quantum com- 
puter? Does the lack of such properties rule out efficient quantum algorithms? 
Are there new problems not known to be NP-complete that can be solved in 
polynomial time on a quantum computer? 
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Abstract. Decomposition theorems are useful tools for bounding the 
convergence rates of Markov chains. The theorems relate the mixing 
rate of a Markov chain to smaller, derivative Markov chains, defined by 
a partition of the state space, and can be useful when standard, direct 
methods fail. Not only does this simplify the chain being analyzed, but 
it allows a hybrid approach whereby different techniques for bounding 
convergence rates can be used on different pieces. We demonstrate this 
approach by giving bounds on the mixing time of a chain on circuits of 
length 2n in S'*. 

1 Introduction 

Suppose that you want to sample from a large set of combinatorial objects. A 
popular method for doing this is to define a Markov chain whose state space fl 
consists of the elements of the set, and use it to perform a random walk. We first 
define a graph H connecting pairs of states that are close under some metric. 
This underlying graph on the state space representing allowable transitions is 
known as the Markov kernel. 

To define the transition probabilities of the Markov chain, we need to consider 
the desired stationary distribution tt on 17. A method known as the Metropolis 
algorithm assigns probabilities to the edges of H so that the resulting Markov 
chain will converge to this distribution. In particular, if A is the maximum degree 
of any vertex in H, and (x, y) is any edge, 



We then assign self loops all remaining probability at each vertex, so P{x,x) > 
1/2 for all a; € 17. If is connected, tt will be the unique stationary distribution 
of this Markov chain. We can see this by verifying that detailed balance is satsified 
on every edge (a;,y), i.e., Tr{x)P{x,y) = Tr{y)P{y,x). 

As a result, if we start at any vertex in 17 and perform a random walk 
according to the transition probabilities defined by P, and we walk long enough, 

* Supported in part by NSF Grant No. CGR-9703206. 



Dana Randall* 




J. Sgall, A. Pultr, and P. Kolman (Eds.): MFCS 2001, LNCS 2136, pp. 74—86, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 



Decomposition Methods and Sampling Circuits in the Cartesian Lattice 



75 



we will converge to the desired distribution. For this to be useful, we need that 
we are converging rapidly to tt, so that after a small, polynomial number of steps, 
our samples will be chosen from a distribution which is provably arbitrarily close 
to stationarity. A Markov chain with this property is rapidly mixing. 

Consider, for example, the set of independent sets I of some graph G. Taking 
the Hamming metric, we can define H by connecting any two independent sets 
that differ by the addition or deletion of a single vertex. A popular stationary 
distribution is the Gibbs distribution which assigns weight n{I) = to /, 

where 7 > 0 is an input parameter of the system, |/| is the size of the independent 
set /, and Z.~^ = normalizing constant known as the partition 

function. In the Metropolis chain, we have P{I, I') = ^ min(l, 7) if /' is formed 
by adding a vertex to /, and P{I, I') = ^ min(l, 7“^) if I' is formed by deleting 
a vertex from I. 

Recently there has been great progress in the design and analysis of Markov 
chains which are provably efficient. One of the most popular proof techniques 
is coupling. Informally, coupling says that if two copies of the Markov chain 
can be simultaneously simulated so that they end up in the same state very 
quickly, regardless of the starting states, then the chain is rapidly mixing. In 
many instances this is not hard to establish, which gives a very easy proof of 
fast convergence. 

Despite the appeal of these simple coupling arguments, a major drawback 
is that many Markov chains which appear to be rapidly mixing do not seem to 
admit coupling proofs. In fact, the complexity of typical Markov chains often 
makes it difficult to use any of the standard techniques, which include bounding 
the conductance, the log Sobolev constant or the spectral gap, all closely related 
to the mixing rate. 

The decomposition method offers a way to systematically simplify the Mar- 
kov chain by breaking it into more manageable pieces. The idea is that it should 
be easier to apply some of these techniques to the simplified Markov chains and 
then infer a bound on the original Markov chain. In this survey we will con- 
centrate on the state decomposition theorem which utilizes some partition of the 
state space. It says that if the Markov chain is rapidly mixing when restricted 
to each piece of the partition, and if there is sufficient flow between the pieces 
(defined by a “projection” of the chain), then the original Markov chain must 
be rapidly mixing as well. The allows us to take a top-down approach to mixing 
rate analysis, whereby we need only consider the mixing rate of the restrictions 
and the projection. In many cases it is easier to define good couplings on these 
simpler Markov chains, or to use one of the other known methods of analysis. We 
note, however, that using indirect methods such as the decomposition or com- 
parison (defined later) invariably adds orders of magnitude the bounds on the 
running time of the algorithm. Hence it is wise to use these methods judiciously 
unless the goal is simply to establish a polynomial bound on the mixing rate. 
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2 Mixing Machinery 

In what follows, we assume that A4 is an ergodic (i.e. irreducible and aperi- 
odic), reversible Markov chain with finite state space 17, transition probability 
matrix P, and stationary distribution tt. 

The time a Markov chain takes to converge to its stationary distribution, i.e., 
the mixing time of the chain, is measured in terms of the distance between the 
distribution at time t and the stationary distribution. Letting P*{x,y) denote 
the t-step probability of going from x to y, the total variation distance at time t 
is 

||P*,7r||t„ = maxi ^ \P\x,y) - Tr{y)\. 
v^n 

For £ > 0, the mixing time rie) is 

T(e) = min{t : ||P* ,7r||t„ < e,Vt' > t}. 

We say a Markov chain is rapidly mixing if the mixing time is bounded above 
by a polynomial in n and log where n is the size of each configuration in the 
state space. 

It is well known that the mixing rate is related to the spectral gap of the 
transition matrix. For the transition matrix P, we let Gap{P) = Aq — |Ai| denote 
its spectral gap, where Aq, Ai, . . . , A|^|_i are the eigenvalues of P and 1 = Aq > 

I All > I Ail for all i>2. The following result the spectral gap and mixing times 
of a chain (see, e.g., [18]). 

Theorem 1. Let tt* = min 2 ,gi 7 7r(a;). For all e > 0 we have 

(^) 

(b) ^ 2Gap(P) l0g(^)- 

Hence, if l/Gap{P) is bounded above by a polynomial, we are guaranteed 
fast (polynomial time) convergence. For most of what follows we will rely on the 
spectral gap bound on mixing. Theorem 1 is useful for deriving a bound on the 
spectral gap from a coupling proof, which provides bounds on the mixing rate. 

We now review of some of the main techniques used to bound the mixing 
rate of a chain, including the decomposition theorem. 



2.1 Path Coupling 

One of the most popular methods for bounding mixing times has been the cou- 
pling method. A coupling is a Markov chain on J7 x 17 with the following proper- 
ties. Instead of updating the pair of configurations independently, the coupling 
updates them so that i) the two processes will tend to coalesce, or “move to- 
gether” under some measure of distance, yet ii) each process, viewed in isolation, 
is performing transitions exactly according to the original Markov chain. A valid 
coupling ensures that once the pair of configurations coalesce, they agree from 
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that time forward. The mixing time can be bounded by the expected time for 
configurations to coalesce under any valid coupling. 

The method of path coupling simplifies our goal by letting us bound the 
mixing rate of a Markov chain by considering only a small subset of 17 x 17 [3, 6]. 

Theorem 2. (Dyer and Greenhill [6]) Let be an integer valued metric 
defined on f2 x f2 which takes values in {0, . . . , B}. Let U be a subset o/ 17 x 17 
such that for all {x,y) S I7x 17 there exists a path x = zo, zi, . . . , Zr = y between x 
and y such that {zi, Zi+\) G U for 0 < i < r and 

r — 1 

^<P{zi,Zi+i) =<P{x,y). 

i=0 

Define a coupling {x, y) (a;', y') of the Markov chain A4 on all pairs (x, y) G C/. 
Suppose that there exists a < 1 such that y')] < a d>{x, y) for all (xt,yt) G 

U, Then the mixing time of M. satisfies 

r(,) < MElLl. 

I — a 

Useful bounds can also be derived in the case that a = 1 in the theorem (see 
[6]). 

2.2 The Disjoint Decomposition Method 

Madras and Randall [12] introduced two decomposition theorems which relate 
the mixing rate of a Markov chain to the mixing rates of related Markov chains. 
The state decomposition theorem allows the state space to be decomposed into 
overlapping subsets; the mixing rate of the original chain can be bounded by 
the mixing rates of the restricted Markov chains, which are forced to stay within 
the pieces, and the ergodic flow between these sets. The density decomposition 
theorem is of a similar flavor, but relates a Markov chain to a family of other 
Markov chains with the same Markov kernel, where the transition probabilities 
of the original chain can be described as a weighted average of the transition 
probabilities of the chains in the family. 

We will concentrate on the state decomposition theorem, and will present 
a newer version of the theorem due to Martin and Randall [15] which allows 
the decomposition of the state to be a partition, rather than requiring that the 
pieces overlap. 

Suppose that the state space is partitioned into m disjoint pieces Q\,. . . , flm- 
For each i = 1, . . . , to, define Pi = P{f2i} as the restriction of P to L2i which rejects 
moves that leave fii. In particular, the restriction to 17^ is a Markov chain, M,, 
where the transition matrix Pi is defined as follows: li x y and x,y G fli 
then Pi{x,y) = P(x,y); if a: G 17* then Pi{x,x) = 1 - J2y(^ni,y^ I"®* 

TTi be the normalized restriction of tt to 17^, i.e., 'Ki{A) = Notice that 

if Qi is connected then tt^ is the stationary distribution of Pi. 
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Next, define P to be the following aggregated transition matrix on the state 
space [m]: 

yeOj 



Theorem 3. (Martin and Randall [15]) Let Pi and P be as above. Then the 
speetral gaps satisfy 



Gap{P) > -Gap{P) min Gap{Pi). 

2 iG [m] 

A useful corollary allows us to replace P in the theorem with the Metropolis 
chain defined on the same Markov kernel, provided some simple conditions are 
satisfied. Since the transitions of the Metropolis chain are fully defined by the 
stationary distribution tF, this is often easier to analyze than the true projection. 

Define Pm on the set [m], with Metropolis transitions Pm(*, j) =min{lj 
Let di{f2j) = {y £ f^j : 3 X £ Hi with P{x,y) > 0}. 

Corollary 4. [15] With Pm as above, suppose there exists f3 > 0 and 7 > 0 
such that 

(a) P{x,y) > P whenever P{x,y) > 0; 

(b) 7r(5i(l7j)) > 77r(l?j) whenever P{i,j) > 0. 

Then 

Gap{P) > Gap{PM) niin Gap{Pi). 

2 2=1,. ..,m 

2.3 The Comparison Method 

When applying the decomposition theorem, we reduce the analysis of a Markov 
chain to bounding the convergence times of smaller related chains. In many cases 
it will be much simpler to analyze variants of these auxiliary Markov chains in- 
stead of the true restrictions and projections. The comparison method tells us 
ways in which we can slightly modify one of these Markov chains without qual- 
itatively changing the mixing time. For instance, it allows us to add additional 
transition edges or to amplify some of the transition probabilities, which can be 
useful tricks for simplifying the analysis of a chain. 

Let P and P be two reversible Markov chains on the same state space fl 
with the same stationary distribution tt. The comparison method allows us to 
relate the mixing times of these two chains (see [4] and [17]). In what follows, 
suppose that Gap{P), the spectral gap of P, is known (or suitably bounded) and 
we desire a bound on Gap{P), the spectral gap of P, which is unknown. 

Following [4], we let E{P) = {{x,y) : P{x,y) > 0} and E{P) = {{x,y) : 
P{x,y) > 0} denote the sets of edges of the two chains, viewed as directed 
graphs. For each (x,y) £ E{P), define a path 'y^y using a sequence of states 
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X = Xq,Xi, . . . ,Xk = y with {xi, Xi+\) € E{P), and let \^xy \ denote the length of 
the path. Let r{z,w) = {{x,y) S E{P) : {z,w) S "fxy} be the set of paths that 
use the transition {z,w) of P. Finally, define 



A = max 

{z,w)eE(P) 



1 

n{z)P{z, w) 



\lcoy\TT{x)P{x,y) 

r(z,w) 



Theorem 5. (Diaconis and Saloff-Coste [4]) With the above notation, the 
spectral gaps satisfy Gap{P) > ^Gap{P) 

It is worthwhile to note that there are several other comparison theorems 
which turn out to be useful, especially when applying decomposition techniques. 
The following lemma helps us reason about a Markov chain by slightly modi- 
fying the transition probabilities (see, e.g., [10]). We use this trick in our main 
application, sampling circuits. 

Lemma 6. Suppose P and P' are Markov chains on the same state space, each 
reversible with respect to the distribution tt. Suppose there are constants ci and C 2 
such that c\P{x,y) < P'{x,y) < C 2 P{x,y) for all x ^ y. Then c\Gap{P) < 
Gap(P') < C 2 Gap{P) . 

3 Sampling Circuits in the Cartesian Lattice 

A circuit in is a walk along lattice edges which starts and ends at the origin. 
Our goal is to sample from C, the set of circuits of length 2n. It is useful to 
represent each walk as a string of 2n letters using {oi, ...,ad} and their inverses 
where Oi represents a positive step in the fth direction, and a~^ 
represents a negative step. Since these are closed circuits, the number of times 
appears must equal the number of times a~^ appears, for all i. We will show how 
to uniformly sample from the set of all circuits of length 2n using an efficient 
Markov chain. The primary tool will be finding an appropriate decomposition 
of the state space. We outline the proof here and refer the reader to [16] for 
complete details. 

Using a similar strategy, Martin and Randall showed how to use a Marko 
chain to sample circuits in regular d-ary trees, i.e., paths of length 2n which trace 
edges of the tree starting and ending at the origin [15]. This problem generalizes 
to sampling Dyke paths according to a distribution which favors walks that hit 
the x-axis a large number of times, known in the statistical physics community as 
“adsorbing staircase walks.” Here too the decomposition method was the basis of 
the analysis. We note that there are other simple algorithms for sampling circuits 
on trees which do not require Markov chains. In contrast, to our knowledge, the 
Markov chain based algorithm discussed in this paper is the first efficient method 
for sampling circuits on 
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3.1 The Markov Chain on Circuits 

The Markov chain on C is based on two types of moves: transpositions of neigh- 
boring letters in the word (which keep the numbers of each letter fixed) and 
rotations, which replace an adjacent with (aj,a~^), for some pair of 

letters and aj. 

We now define the transition probabilities 7^ of Al, where we say x Gu X to 
mean that we choose x from set X uniformly. Starting at a, do the following. 
With probability 1/2, pick i Gu [n — 1] and transpose ai and ai+i. With proba- 
bility 1/2, pick i Gu [n — 1] and k Gu [d] and if <7^ and (Ti+i are inverses (where ai 
is a step in the positive direction), then replace them with (a^, a^^). Otherwise 
keep a unchanged. 

The chain is aperiodic, ergodic and reversible, and the transitions are sym- 
metric, so the stationary distribution of this Markov chain is the uniform distri- 
bution on C. 

3.2 Bounding the Mixing Rate of the Circuits Markov Chain 

We bound the mixing rate of M. by appealing to the decomposition theorem. 
Let a G C and let Xi equal the number of occurrences of at, and hence in a, 
for all i. Define the trace Tr(cr) to be the vector X = (a;i, ...,Xd)- This defines a 
partition of the state space into 



C = \J Cx, 

where the union is over all partitions of n into d pieces and Cx is the set of 
words <7 G C such that Tr(cr) = X = (xi, ...,Xd). The cardinality of the set Cx 
is ( ) , the number of distinct words (or permutations) of length 2n 

using the letters with these prescribed multiplicities. The number of sets in the 
partition of the state space is exactly the number of partitions of n into d pieces. 

Each restricted Markov chain consists of all the words which have a fixed 
trace. Hence, transitions in the restricted chains consist of only transpositions, 
as rotations would change the trace. The projection P consists of a simplex 
containing D vertices, each representing a distinct partition of 2n. Letting 

<!>{X,Y) = ]^\\X-Y\\^ , 

two points X and Y are connected by an edge of P iff ^{X, Y) = 1, where 
II • 111 denotes the metric. In the following we make no attempt to optimize the 
running time, and instead simply provide polynomial bounds on the convergence 
rates. 

• Step 1 — The restricted Markov chains: Consider any of the restricted 
chains Px on the set of configurations with trace X. We need to show that this 
simpler chain, connecting pairs of words differing by a transposition of adjacent 
letters, converges quickly for any fixed trace. 
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We can analyze the transposition moves on this set by mapping Cx to the 
set of linear extensions of a particular partial order. Consider the alphabet 
U and the partial order defined by the re- 
lations ^ ai ^2 ^ ^ and Ai^i -< Ai ^2 -< ■■■ -< Ai^^i, for all i. It is 

straightforward to see that there is a bijection between the set of circuits in 
Cx and the set of linear extensions to this partial order (mapping a~^ to A). 
Furthermore, this bijection preserves transpositions. We appeal to the following 
theorem due to Bubley and Dyer [3] : 

Theorem 7. The transposition Markov chain on the set of linear extensions to 
a partial order on n elements has mixing time 0(n^(log^ n + loge“^)). 

Referring to theorem 1, we can derive the following bound. 

Corollary 8. The Markov chain Px has spectral gap Gap{Px) > l/(cn^log^ n) 
for some constant c. 

• Step 2 — The projection of the Markov chain: The states f2 of the pro- 
jection consist of partitions of n into d pieces, so |17| = D. The stationary prob- 
ability of X = (a;i, ...,a;d) is 7f(X) = , the number of words with 

these multiplicities. 

The Markov kernel is defined by connecting two partitions X and Y if the 
distance d>(X, Y) = (||a;— y||i)/2 = 1. Before applying corollary 4 we first need to 
bound the mixing rate of the Markov chain defined by Metropolis probabilities. 
In particular, if X = (xi, ..., Xd) and Y = {xi , ..., Xi + 1, ..., Xj — 1, ..., Xd), then 



We analyze this Metropolis chain indirectly by first considering a variant 
P'j^ which admits a simpler path coupling proof. Using the same Markov kernel, 
define the transitions 



In particular, the Xi in the denominator is the value which would be increased 
by the rotation. Notice that detailed balance is satisfied: 



This guarantees that P'j^ has the same stationary distribution as Pm, namely tt. 

The mixing rate of this chain can be bounded directly using path coupling. 
Let P C 17 X 17 be pairs of states X and Y such that ^(X, Y) = 1. We couple 
by choosing the same pair of indices i and j, and the same bit b S { — 1,1} to 
update each of X and Y, where the probability for accepting each of these moves 
is dictated by the transitions of P^. 





7t(X) _ {x^ + 1)2 



Pm{Y,X) 

P'm{X,Y)- 



W{Y) 
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Lemma 9. For any pair (Xt,Yt) € U, the expected change in distance after one 
step of the coupled chain is yj+i)] < (1 — Yt). 

Proof. If (Xt,Yt) € U, then there exist coordinates k and k' such that yu = Xk + ^ 
and yk' = Xk' — 1- Without loss of generality, assume that k = \ and k' = 2. We 
need to determine the expected change in distance after one step of the coupled 
chain. Suppose that in this move we try to add 1 to Xi and yi and subtract 1 
from Xj and yj. We consider three cases. 

Case 1: If \{i,j} n {1,2}| = 0, then both processes accept the move with the 
same probability and l^+i) = 1. 

Case 2: If \{i,j} H {1,2}| = 1, then we shall see that the expected change is 
also zero. Assume without loss of generality that i = 1 and j = 3, and first 
consider the case 6=1. Then we move from X to X' = {x\ + 1, X 2 , x^ — 1, Xd) 
with probability 2 n'^(xi+i)‘^ from Y to Y' = {xi + 2 ,X 2 — 1 , 2:3 — l,...,a;d) 
with probability 2 n^(xi+ 2 y ■ Since P^j{X,X') > P^(Y, Y'), with probability 
Pj^f(Y,Y') we update both X and Y; with probability P'j^{X,X') — P^(Y,Y') 
we update just X; and with all remaining probability we update neither. In the 
first case we end up with X' and Y', in the second we end up with X' and Y 
and in the final case we stay at X and Y. All of these pairs are unit distance 
apart, so the expected change in distance is zero. If 6 = —1, then A') = 

P^{Y,Y') = 2 n^(x 3 +i)^ again the coupling keeps the configurations unit 
distance apart. 

Case 3: If \{i,j} H {1,2}| = 2, then we shall see that the expected change is 
at most zero. Assume without loss of generality that z = 1, j = 2 and 6=1. 
The probability of moving from X to X" = (a;i + 1 , 2:2 — ^,---,Xd) = Y is 
Pj^f{X, X”) = 2 n'^{xi+i)'^ ■ probability of moving from Y to Y" = (xi+2, X 2 ~ 
2,...,Xd) is P'j^{Y,Y") = 2 n‘Hxi+ 2 )-^ - probability P^{Y,Y”) we update 

both configurations, keeping them unit distance apart, and with probability 
P^{X,X”) — P^(Y, Y") > ^ we update just X, decreasing the distance to 
zero. When 6 = — 1 the symmetric argument shows that we again have a small 
chance of decreasing the distance. 

Summing over all of these possibilities yields the lemma. □ 

The path coupling theorem implies that the mixing time is bounded by r(e) < 
0(n® logn). Furthermore, we get the following bound on the spectral gap. 

Theorem 10. The Markov chain P^ on fl has spectral gap Gap(P'j^) > c' f 
(n®logn) for some constant c' . 

This bounds the spectral gap of the modified Metropolis chain P'j^, but we 
can readily compare the spectral gaps of P^j and Pm using lemma 6. Since all 
the transitions of Pm are at least as large as those of P'j^j, we find 

Corollary 11. The Markov chain Pm on f2 has spectral gap Gqp{Pm) > c' f 
(n®log n). 
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• Step 3 — Putting the pieces together: These bounds on the spectral gaps 
of the restrictions Pi and the Metropolis projection P'j^ enable us to apply the 
decomposition theorem to derive a bound on the spectral gap of P, the original 
chain. 

Theorem 12. The Markov chain P is rapidly mixing on C and the spectral gap 
satisfies Gap{P) > c"/(n^^dlog^ n), for some constant c" . 

Proof. To apply corollary 4, we need to bound the parameters (3 and 7 . We find 
that j3 > the minimum probability of a transition. To bound 7 we need to 
determine what fraction of the words in Cx are neighbors of a word in Cy if 
T>{X,Y) = 1 (since tt is uniform within each of these sets). If X = (xi, ..., Xd) 
and Y = (a:i, ..., + 1, ..., — 1, ...,a;„), this fraction is exactly the likelihood 

that a word in Cx has an followed by an and this is easily determined 

to be at least 1/n. 

Combining f3 > 7 > ^ with our bounds from lemmas 8 and 11, corol- 
lary 4 gives the claimed lower bound on the spectral gap. □ 

4 Other Applications of Decomposition 

The key step to applying the decomposition theorem is finding an appropriate 
partition of the state space. In most examples a natural choice seems to be to 
cluster configurations of equal probability together so that the distribution for 
each of the restricted chains is uniform, or so that the restrictions share some 
essential feature which will make it easy to bound the mixing rate. 

In the example of section 3, the state space is divided into subsets, each 
representing a partition of n into d parts. It followed that the vertices of the 
projection formed a d-dimensional simplex, where the Markov kernel was formed 
by connecting vertices which are neighbors in the simplex. We briefly outline 
two other recent applications of the decomposition theorem where we get other 
natural graphs for the projection. In the first case graph defining the Markov 
kernel of the projection is one-dimensional and in the second it is a hypercube. 

4.1 Independent Sets 

Our first example is sampling independent sets of a graph according to the Gibbs 
measure. Recall that tt{I) = /^ 7 , where 7 > 0 is an input parameter and Zj 

normalizes the distribution. There has been much activity in studying how to 
sample independent sets for various values of 7 using a simple, natural Markov 
chain based on inserting, deleting or exchanging vertices at each step. Works of 
Luby and Vigoda [9] and Dyer and Greenhill [5] imply that this chain is rapidly 
mixing if 7 < 2/(Z\ — 2) , where A is the maximum number of neighbors of any 
vertex in G. It was shown by Borgs et al. [1] that this chain is slowly mixing on 
some graphs for 7 sufficiently large. 

Alternatively, Madras and Randall [12] showed that this algorithm is fast 
for every value of 7 , provided we restrict the state space to independent sets of 
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size at most n* = [|t/|/2(Z\ + 1)J. This relies heavily on earlier work of Dyer 
and Greenhill [6] showing that a Markov chain defined by exchanges is rapidly 
mixing on the set of independent sets of fixed size k, whenever k < n*. The 
decomposition here is quite natural: We partition X, the set of independent sets 
of G, into pieces Tk according to their size. The restrictions arising from this 
partition permit exchanges, but disallow insertions or deletions, as they exit the 
state space of the restricted Markov chain. These are now exactly the Markov 
chains proven to be rapidly mixing by Dyer and Greenhill (but with slightly 
greater self-loop probabilities) and hence can also be seen to be rapidly mixing. 
Gonsequently, we need only bound the mixing rate of the projection. 

Here the projection is a one-dimensional graph on {0,...,n*}. Further cal- 
culation determines that the stationary distribution W{k) of the projection is 
unimodal in k, implying that the projection is also rapidly mixing. We refer the 
reader to [12] for details. 

4.2 The Swapping Algorithm 

To further demonstrate the versatility and potential of the decomposition 
method, we review an application of a very different flavor. In recent work, 
Madras and Zheng [13] show that the swapping algorithm is rapidly mixing for 
the mean field Ising model (i.e., the Ising model on the complete graph), as well 
as for a simpler toy model. 

Given a graph G = (V,E), the ferromagnetic Ising model consists of a 
graph G whose vertices represent particles and whose edges represent inter- 
actions between particles. A spin configuration is an assignment of spins, either 
-I- or — , to each of the vertices, where adjacent vertices prefer to have the same 
spin. Let > 0 be the interaction energy between vertices x and y, 

(x,y) G E. Let a G G = be any assignment of to each 

vertices. The Hamiltonian of cr is 

{x,y)eE 

where 1 a is the indicator function which is 1 when the event A is true 
otherwise. The probability that the Ising spin state is a is given by the 
distribution: 

e-0E(a) 

= -mr 

where j3 is inverse temperature and 

Z{G) = 

(7 

It is well known that at sufficiently low temperatures the distribution is 
bimodal (as a function the number of vertices assigned -I-), and any local dy- 
namics will be slowly mixing. The simplest local dynamics, Glauber dynamics, is 
the Markov chain defined by choosing a vertex at random and flipping the spin 
at that vertex with the appropriate Metropolis probability. 



where 
of the 



and 0 
Gibbs 
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Simulated tempering, which varies the temperature during the runtime of an 
algorithm, appears to be a useful way to circumvent this difficulty [8, 14]. The 
chain moves between m close temperatures that interpolate between the temper- 
ature of interest and very high temperature, where the local dynamics converges 
rapidly. The swapping algorithm is a variant of tempering, introduced by Geyer 
[7], where the state space is 1?'" and each configuration S = (cri, ...,(7^) € 17’" 
consists of one sample at each temperature. The stationary distribution dis- 
tribution is Tr(S') = 7T’Tj^7ri(cri), where is the distribution at temperature i. 
The transitions of the swapping algorithm consist of two types of moves: with 
probability 1/2 choose i S [m] and perform a local update of ai (using Glauber 
dynamics at this fixed temperature); with probability 1/2 choose i S [to — 1] and 
move from S = (cri, ..., am) to S' = (cti, ..., ai+i, Oi, ..., CTm), i-e., swap configura- 
tions i and i + 1, with the appropriate Metropolis probability. 

The idea behind the swapping algorithm, and other versions of tempering, is 
that, in the long run, the trajectory of each Ising configuration will spend equal 
time at each temperature, potentially greatly speeding up mixing. Experimen- 
tally, this appears to overcome obstacles to sampling at low temperatures. 

Madras and Zheng show that the swapping algorithm is rapidly mixing on 
the mean- field Ising model at all temperatures. Let L?'*’ C 17 be the set of 
configurations that are predominantly -I-, and similarly J7“. Define the trace of 
a configuration S to be Tr(S') = {v\, ..., Vm) G {-I-, — where Vi = + \i ai € 
and Vi = — li Gi ^ . The analysis of the swapping chain uses decomposition 

by partitioning the state space according to the trace. 

The projection for this decomposition is the m-dimensional hypercube where 
each vertex represents a distinct trace. The stationary distribution is uniform on 
the hypercube because, at each temperature, the likelihood of being in and 
17“ are equal due to symmetry. Relying on the comparison method, it suffices 
to analyze the following simplification of the projection: Starting at any vertex 
V = (vi, ...jVm) in the hypercube, pick i [to]. If z = 1, then with probability 
1/2 flip the first bit; if z > 1, then with probability 1/2 transpose the Vi-i 
and Vi; and with all remaining probability do nothing. This chain is easily seen 
to be rapidly mixing on the hypercube and can be used to infer a bound on the 
spectral gap of the projection chain. 

To analyze the restrictions, Madras and Zheng first prove that the simple, 
single flip dynamics on is rapidly mixing at any temperature; this result 
is analytical, relying on the fact that the underlying graph is complete for the 
mean-field model. Using simple facts about Markov chains on product spaces, it 
can be shown that the each of the restricted chains must also be rapidly mixing 
(even without including any swap moves). Once again decomposition completes 
the proof of rapid mixing, and we can conclude that the swapping algorithm is 
efficient on the complete graph. 
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Abstract. Recently, several algorithms for the NP-complete problem k- 
SAT have been proposed and rigorously analyzed. These algorithms are 
based on the heuristic principle of local search. Their deterministic and 
their probabilistic versions and variations, have been shown to achieve 
the best complexity bounds that are known for fc-SAT (or the special 
case 3-SAT). We review these algorithms, their underlying principles 
and their analyses. 



1 Introduction 

Consider Boolean formulas in conjunctive normal form (CNF), i.e. formulas, 
which are conjunctions (and’s) of disjunction (or’s) of variables or negated vari- 
ables. A variable or negated variable is called a literal. We assume here that 
the variables are xi^X 2 t ■ ■ and the complexity of all algorithms is mea- 
sured as a function of n. A formula in CNF is in fc-CNF if all clauses (dis- 
junction of literals) contain at most k literals. It is well known that fc-SAT is 
NP-complete provided fc > 3. Here fc-SAT is the problem of determining whether 
a formula in fc-CNF is satisfiable. A formula is satisfiable if there is an assignment 
a : {xi,X 2 , ■ ■ . , Xn} — > {0, 1} which assigns to each variable a Boolean value such 
that the entire formula evaluates to true. 

Since fc-SAT is NP-complete there is a considerable interest in algorithms that 
run faster than the naive algorithm which just tests all 2" potential assignments 
and which therefore takes worst-case time 2" (up to some polynomial factor). 
Especially the case of 3-SAT is interesting since fc-SAT becomes NP-complete, 
starting with fc = 3. A milestone paper in this respect is [10]. For every fc an 
algorithm for fc-SAT, based on clever backtracking, is presented. In the case of 
3-SAT a worst-case upper bound of 1.619" is shown in [10]. In the general fc-SAT 
case, the complexity is (ofe)" where is the solution to the following equation 
(afc)^“^ = (ofc)^”^ -I- ... -I- (ofc)^ -I- 1. The method has been improved in [9] to 
1.505". Starting with the paper [12] probabilistic algorithm came into the scene, 
where an upper bound of 1.58" is shown in the case of 3-SAT, and which was 
later improved in [13] to 1.36". 

Another probabilistic algorithm, based on local search, was shown to achieve 
(4/3)"0(1.3333...)" in the case of 3-SAT [15]. This algorithm and its variations 
are the theme of this article. 
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The notion “local search” refers to algorithms which wander through the 
search space {0, 1}” of all assignments, at each step altering the actual assign- 
ment to some neighbor assignment. Neighborhood is defined in terms of having 
Hamming-distance = 1. The Hamming-distance between two assignments is the 
number of bits where the assignments differ. 



2 A Local Search Procedure for fc-SAT 

Suppose, input formula F in fc-CNF is given. Let F contain n vari- 
ables x\,...,Xn- An assignment to these variables can be considered as a 0- 
1-string of length n. 

Consider the following recursive procedure local-search. 

procedure local-search ( a : assignment; m : integer ) : boolean; 
{Returns true iff there is a satisfying assignment for formula F 
within Hamming-distance < m from assignment a.} 

begin 

if T"(a) = 1 then return true; 
if TO = 0 then return false; 

{Let C = {h, h, - ■ ■ , h} be some clause in F with C{a) = 0, 
i.e. all k literals k of C are set to 0 under a} 

for i := 1 to fc do if local-search(a|Zi, to — 1) then return true; 
return false; 
end local-search 

Here a\li denotes the assignment a' obtained from a by flipping the value of 
literal k, i.e. changing the value of k from 0 to 1. If k is the (negated or non- 
negated) variable xj, this means that the j-th bit in a is changed from 1 to 0 
(or from 0 to 1, respectively). The Hamming distance between a and this new 
assignment a' is 1. 

The correctness of the procedure local-search follows from the following ob- 
servation. If there is a satisfying assignment a* of formula F, and the Hamming 
distance between a* and the actual assignment a in the procedure evocation is d, 
then for at least one of the assignments a|Zi, i = 1, . . . , k, used in the recursive 
procedure calls, the Hamming distance to a* is d— 1. At least one of the recursive 
calls of local-search will therefore return the correct result. 

It was suggested to modify the algorithm so that it “freezes” the assignment 
of a variable once it was flipped. This prevents that a variable value is flipped 
back in a deeper recursive procedure call. This modification is also correct, but 
will not be considered here. Another heuristic is to choose among the clauses C 
with C(a) = 0 a shortest one. 

An evocation of procedure local-search(a, to), if to > 0 and F(a) = 0, causes 
up to k evocations of local-search(. . . ,to — 1). That is, the recursion tree in- 
duced by local-search(a, to) has fc"* many leaves. Hence, the complexity of local- 
search(a,TO) is within a polynomial factor of A:™. 
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Now, here is a simple deterministic algorithm for 3-SAT. We perform two 
calls of the procedure local-search, one for local-search(0”, n/2) and one for 
local-search(l”, n/2). It is clear that the entire search space {0, 1}” is covered 
by these two calls since every assignment is within Hamming distance of n/2 
from either 0" or 1”. The complexity of this deterministic 3-SAT algorithm is 
within a polynomial factor of 3”/^ « 1.732", a bound which comes close to the 
Monien-Speckenmeyer bound [10] but is obtained in a much simpler way. 

Notice that it is not a good idea to generalize this to A:-SAT with fc > 4, since 
in this case > 2". 



3 Random Initial Assignments 



A better idea (introduced in [14]) is to use independent random initial assign- 
ments tti, 02 , . . . , Ot G {0, 1}", and at the same time use a Hamming radius /3n, 
smaller than n/2, so that this new (probabilistic) algorithm for fc-SAT looks like 
this: 

for t times do 
begin 

Choose an assignment a G {0, 1}", uniformly at random; 
if local-search(o, (in) then accept 
end; 
reject 

The question is how to choose t und (3 optimally so that the error probability 
becomes negligible, as well as the overall complexity is minimized. It is clear 
that the overall (worst-case) complexity is t ■ where t could be a function 
depending on n. 

Regarding the error probability, a single evocation of local-search(a, /3n) with 
randomly chosen a G {0, 1}" finds a satisfying assignment a* (if it exists) with 
probability 

E 0n /n\ 
i=0 \i) 

2 " 

Since t random assignments are chosen independently, the (error) probability of 
missing the satisfying assignment a* in each evocation of local-search(a, /3n) is 






1 - 



2 " 



To bring this error probability below some negligible value, like e it suffices 
to choose 




Using Stirling’s approximation n! x (n/e)"-\/27m, it can be seen that, for fixed 
P, the expression Yli=o iT) behaves asymptotically, up to a polynomial factor. 
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like [(1//3)^(1/(1 — (3)Y (see [2]). Therefore, it suffices, up to a polynomial 
factor, to choose t as 

To oT 

2 • • (1 - 

The overall complexity of this probabilistic /c-SAT algorithm is therefore within 
a polynomial factor of 

By calculating the derivative of the expression in brackets and setting it to 
zero, it can be seen that the expression in brackets is minimized by the choice 
(3 = l/(/c + 1). Inserting j3 = l/(/c + 1) in the expression gives the complexity 
bound of [2fc/(fc + 1)]". In the case of 3-SAT this is 1.5". 

Actually, by analyzing the possible structure of 3 successive recursion levels 
of local-search by a careful case analysis, one can see that not all 27 possi- 
ble recursive calls (after 3 levels of local-search) can occur. Therefore, in the 
case of 3-SAT, the complexity bound can be further reduced, namely to 1.481" 
(see [4,5]). (Actually, this is not just a closer analysis but also a modification of 
the algorithm since the clauses C in the procedure local-search need to be chosen 
carefully.) This bound also applies in the derandomized version of the algorithm 
as discussed in the next section. 

4 Derandomization Using Covering Codes 

The above example {0", 1"} of two especially chosen initial assignments (to- 
gether with an appropriate Hamming distance, which is n/2 in this case) is 
nothing else than a special covering code (see [3]). Instead of choosing the initial 
assignments at random as in the previous section, we are now looking for a sys- 
tematic, deterministic way of selecting {oi, 02 , . . . , Ot} such that, together with 
an Hamming distance (3n, as small as possible, we get 

t 

{0,1}" = |Ji7/3„(ad 

i=l 

Here Hm{a) = {b G {0, 1}" | the Hamming distance between a and b is at most 
mj. With such a code, the algorithm of the last section becomes deterministic 
since we can cycle through all code words (i.e. initial assignments) systematically 
in a deterministic manner. 

In terms of coding theory, we are looking for codes with small covering radius 
(/3n) and a small number of code words (t) such that the enumeration of all code 
words is possible within time polynomial in the number of code words. In the 
ideal case of a perfect code, the relation between (3 and t reaches the Hamming 
bound, i.e. the number of code words t satisfies 
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which is, up to a polynomial factor, [2 • • (1 — Actually, as seen in 

the last section, the Hamming bound is reached by a set of random code words, 
up to a polynomial factor, and up to a very small error probability. 

In [4,5] two approaches are suggested to build up a good covering code. The 
first approach achieves “almost perfect” codes and needs only polynomial space. 
The second approach achieves perfect codes (up to some polynomial factor) , but 
needs exponential space. 

Approach 1: Suppose /3 is fixed beforehand. Using a probabilistic analysis like in 
the last section, it can be proved that for every e > 0 there is a code length rig 
such that a random code {oi, 02 , . . . , with this code length will satisfy 
{0,1}"' = [Si=iHpn^ai) with high probability, provided that the number of 
code words tg satisfies tg = [2 • /3^ • (1 — Now given the actual 

number of variables n, we assume that n is a multiple of Ug, i.e. n = qg ■ rig. 
Then we can design a covering code of code length n as follows: each code 
word consists of qg blocks of length rig and each block cycles through all the 
code words joi, 02 , . . . , independently. That is, this code has (tg)"/"', i.e. 
[2 • /3^ • (1 — many code words. For every choice of e the small code 

of length rig can be “hard-wired” in the respective fc-SAT algorithm (which is 
deterministic, although the existence of those constant-size codes is shown by 
a probabilistic argument), and which has complexity [2fc/(fc — 1)]^^+^^". Notice 
that the space complexity is polynomial. 

Approach 2: Fix the constant c = 6 and construct a covering code of length 
njc in a greedy manner: enumerate all strings of length n/c and always choose 
among the remaining strings such a string as next code word which covers the 
most strings not yet covered. The time required by this greedy algorithm is 2^"^°. 
Also, exponential space is needed. The number of code words is within a poly- 
nomial factor of the Hamming bound (cf. [7]). Then, similar to Approach 1, 
concatenate c codewords to obtain one code word of length n. Finally, a determin- 
istic fc-SAT algorithm is obtained with worst-case time complexity (2fc/(fc-|- 1))" 
and exponential space complexity. 



5 Probabilistic Local Search 

Now we turn back to a probabilistic procedure (based on [15]). First, we go 
back to the approach of choosing the initial assignment at random. Second, the 
deterministic search done by procedure local-search will be substituted by some 
random searching. Each time a clause C has been selected which is false under 
the actual assignment a, one of the literals in this clause is selected uniformly at 
random und its value under the assignment a is flipped. The resulting algorithm 
looks as follows. 
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for t times do 
begin 

Choose an assignment a G {0, 1}", uniformly at random; 

for u times do 
begin 

if F{a) = 1 then accept; 

{Let C = {I 1 J 2 , ■ ■ ■ , h} be some clause in F with C{a) = 0, 
i.e. all k literals k of C are set to 0 under a} 

Choose one of the k literals k at random; 

Flip the value of li in assignment a; 

end; 

end; 

reject 

The algorithm is similar to the one in [11], but there it is only applied to 2-SAT. 
We have to determine optimal values for t and u (depending on n and k). 
This algorithm can be analyzed using Markov chains. Suppose F is satisfiable. 
Fix some satisfying assignment a* . Under assignment a*, in each clause C of F, 
at least one literal becomes true. We fix in each clause exactly one such literal, 
and call it the good literal of the respective clause. 

After guessing the random initial assignment a, we have some Ham- 
ming distance to the fixed assignment a*. Clearly, this Hamming dis- 
tance is a random variable being symmetrically binomially distributed, i.e. 
Pr{ Hamming distance = j) = 2“"(p. In each step of the inner for-times-loop 
the probability of guessing the good literal in clause C is exactly 1/k. With 
probability 1 — 1/A: we choose a “bad” literal (even if it also makes C true) . That 
is, we perform a random walk on an imaginary Markov chain. We move one step 
closer to our goal (state zero) if we choose the good literal (with probability 1/A:) 
and otherwise (with probability 1 — 1/A:) we increase the distance to state zero by 
one. Notice that the state number is an upper bound to the Hamming distance 
between the actual assignment a and the fixed satisfying assignment a* . 

For this (infinite) Markov chain it is known by standard results (see [6,16]), 
assuming k > 3: 

— Pr(absorbing state 0 is reached ] process started in state j) ={l/{k — 1))-^ 

— E (number of steps until state 0 is reached j process started in state j and the 
absorbing state 0 is reached) = jk/{k — 2) 

The following facts take the initial probability (^2“" for state j into account. 

— Pr(the absorbing state 0 is reached) = (p2“”(l/(A: — 1))-^ = (A:/(2(A: — 

i))r 

— E (number of steps until 0 is reached j the absorbing state 0 is 
reached) = n/{k — 2). 
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Using Markov’s inequality and calculation with conditional probabilities, we ob- 
tain finally, 

Pr(after at most 2n/(fc — 2) steps the state 0 is reached) > 

(l/2).(fc/(2(fc-l)))" 

Like in section 3, to obtain an error probability of at most e~^^, we need to 
repeat the random experiment 20 • 2 • (2(k — 1) /fc)” times. Therefore, the optimal 
choice for u is u = 2nl{k — 2) and for t is t = 40 • {2{k — 1) /fc)". Thus, the entire 
probabilistic algorithm has worst-case complexity {2{k — l)//c)", up to some 
polynomial factor. In the case of 3-SAT, the algorithm has complexity (4/3)”, 
up to some polynomial factor. 

6 Further Improvement by Independent Clauses 

In the following, the bound (4/3)” in the case of 3-SAT is improved somewhat. 
The idea (from [17]) is to substitute the “blind” random guessing of an initial 
assignment a S {0,1}” by some more clever random process such that “some- 
times” the Hamming distance to the fixed satisfying assignment is small (and the 
subsequent random walk on the Markov chain is more likely to be successful.) 

A key notion is that of independent clauses. Two clauses are independent if 
they have no variable in common. The following simple algorithm computes a 
maximal set C of independent clauses in a greedy manner. Let the clauses with 
3 literals in the input formula F be ordered in some way: C\,C 2 , ■ ■ ■ , Cm- 

C :=0; 

for i := 1 to TO do 

if clause Ci is independent of all clauses in C then C := C U {Ci}; 

Notice that the obtained set C has size at most n/3. Now either jCj is “small” 
(jCj < an for some constant a to be determined later), or it is “big” (i.e. jCj > 
an). 

Case 1: \C\ < an. 

In this case we apply the following algorithm. Notice that each independent 
clause C G C can be satisfied by 7 assignments to the 3 occurring variables. 
These assignments can be tested independently for the clauses in C. For each of 
the 7l‘'l < 7“” assignments to the variables occurring in clauses from C we do 
the following. This (partial) assignment is applied to the formula. The result is 
that all clauses from C are satisfied, and additionally, all remaining clauses either 
have already 2 literals, or at least one literal is set under the partial assignment. 
This is because C was constructed to be a maximal set of independent clauses. In 
other words, the remaining formula is a 2-CNF formula. Satisfiability of 2-CNF 
formulas can be tested in polynomial time [I]. Therefore, in this case, we have 
a satisfiability test for the input formula F running in time at most 7“”, up to 
some polynomial factor. 



94 



Uwe Schoning 



Case 2: \C\ > an. 

In this case we use the random walk algorithm like in the last section, but 
with a modified initial distribution of the assignments. We choose the initial 
assignments a at random using the following stochastic process. For each clause 
C G C choose the assignments for the 3 literals in C from the 8 assignments 
000, 001, 010, 100, oil, 101, 110, 111 according to the following probability distri- 



assignment 


probability 


000 


0 


001 


Z 


010 


Z 


100 


Z 


Oil 


y 


101 


y 


no 


y 


111 


X 



These probabilities x, j/, z are then determined by analyzing the complexity of 
this modified algorithm and solving a set of linear equations. The result is 



a; = 1/7, y = 2/21, z = 4/21. 



With these values, the success probability becomes 



(3/4)'*-3|C| . (3/7)|C| > 



(3/4)(l-3a)(3/7)a 



The reciprocal value gives the complexity of the algorithm (up to a polynomial 
factor). The optimal value for a, used to distinguish Cases 1 and 2, can now be 
determined by solving for 

7“ = (4/3)(i-^“)(7/3)“ 

yielding a « 0.146652. With these parameters we obtain the total running time 
1.3303", up to some polynomial factor. 

A further minor improvement is possible as follows, cf. [8]. Instead of fixing 
the values for x, y, z in some way, one can run a finite set of algorithms in parallel 
which all use different (a;, y, z)-values. Even if we don’t know beforehand how 
many and which of the literals in the clauses are satisfied under a * , at least one of 
the finitely many algorithms will obtain a somewhat better success probability 
as the one calculated above. The new bound is 1.3302". The details will be 
presented in the full paper. 
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Abstract. This paper gives a gentle introduction to the semigroup- 
theoretic approach to classifying discrete temporal properties and sur- 
veys the most important results. 



1 Introduction 

Linear temporal logic is a widely used, rigorous formalism for specifying temporal 
properties conveniently, namely by using formal counterparts of natural-language 
constructs such as “always”, “eventually”, “until”, “hitherto” and so forth. De- 
veloped in the 1960s as an extension of tense (modal) logic, linear temporal 
logic had been primarily studied in mathematical logic and linguistics [8] before 
Pnueli [12] and Kroger [9] suggested to use it for specifying temporal properties 
of nonterminating computations such as the behavior of reactive systems. Since 
then it has become ever more important in computer science, especially, as it 
is feasible to determine whether or not a finite-state system (such as a piece 
of hardware or a communication protocol) satisfies a desirable linear temporal 
property. 

The important feature of linear temporal logic are its temporal operators, 
which allow one to express relations between the propositional variables that hold 
true in various points of time. From the most general point of view, a temporal 
operator is simply a first-order formula with one free first-order variable and 
free unary predicate symbols [6]. As a consequence, there are infinitely many 
temporal operators, but only a few of them are in wide-spread use, namely those 
that are easy to phrase in natural language and, at the same time, powerful 
enough to express what is necessary. 

It is an intriguing question to determine how much each of the temporal op- 
erators in use adds to the expressive power of linear temporal logic. Equivalently, 
one can ask which temporal properties can be expressed if linear temporal logic 
is restricted to only certain operators. Similarly, it is interesting to determine 
what can be expressed if the nesting depth of the binary temporal operators 
(which, for a human, are more difficult to parse) is restricted. More general, it is 
an interesting question to characterize the various fragments of linear temporal 
logic. 

There are different ways of characterizing fragments of linear temporal logic. 
One way is to show that the expressive power of a given fragment is the same 
as the expressive power of a different logic or another formalism. Results of this 
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kind include Kamp’s famous result that linear temporal logic and first-order 
logic are equally expressive (in Dedekind-complete time structures) [8], that lin- 
ear temporal logic restricted to unary operators is as expressive as first-order 
logic with two variables [5], and that in linear temporal logic one can define 
exactly the same formal languages that can be defined by star-free regular ex- 
pressions [8,13,10]. From a computational point of view, it is more desirable to 
have an effective characterization in the form of an algorithm that given a prop- 
erty determines if it is expressible in the fragment in question and computes, 
if possible, a respective formula. For example, from the same work mentioned 
above [8,13,10] it follows that a regular language is expressible in linear temporal 
logic if and only if its syntactic semigroup contains no subsemigroup which is a 
non-trivial group. This is an effective description, for from any reasonable rep- 
resentation of a regular language (finite automaton, regular expression, monadic 
second-order formula) one can effectively obtain the multiplication table of its 
syntactic semigroup and by an exhaustive search it is easy to check whether it 
contains a non-trivial group as a subsemigroup. 

The above decision procedure follows a pattern which has been used for 
many effective characterizations of fragments of linear temporal logic and has 
been most successful in that it has also worked for fairly complicated fragments; 
the purpose of this paper is to explain this pattern, the “semigroup paradigm”. 

As described above, the semigroup paradigm is very simple. If one wants 
to obtain an effective characterization of a certain fragment F (where time is 
modeled as an initial segment of the natural numbers) one first considers the 
class L of formal (regular) languages that can be defined by a formula in the 
fragment. (Since every temporal property defines a regular language, L is a 
perfect description of the expressive power of F.) Next, one considers the smallest 
class V of finite semigroups containing all syntactic semigroups of the languages 
in L and which is closed under boolean operations, homomorphic images, and 
finite direct products, the pseudovariety generated by the syntactic semigroups 
of the languages in L. In most cases, 1^ is a perfect description of L in the 
sense that V contains no syntactic semigroup of a language not in L. In the last 
step, one uses semigroup theory to show that V is decidable, which also implies 
that expressibility in F is decidable, for passing from a linear temporal formula 
to the syntactic semigroup of the language associated with it is effective. In 
many instances, the most difficult part is the last step, namely to find a decision 
procedure for V . — The purpose of this paper is to illustrate and explain this 
approach in more detail. 

In most applications, modalities are used that refer only to the present and 
the future (e. g., “sometime in the future”, “until”) even though Kamp, when he 
defined temporal logic, had one future and one past modality in his logic. The 
restriction to future modalities is no restriction from a theoretical point of view: 
by a theorem of Pnueli, Shelah, Stavi and Gabbay [7], every linear temporal 
formula is equivalent to a future formula if one is only interested in defining 
properties of time structures with a starting point. 
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As pointed out above, Kroger and Pnueli suggested to use linear temporal 
logic for specifying properties of non-terminating computations. So time should 
be modeled as the natural numbers (assuming one is interested in discrete time). 
In this paper, only finite prefixes of the natural numbers are considered. Most 
results presented here can be extended to the natural numbers (w- words), but 
different techniques are required, which are out of the scope of this paper. 

Basic notation. When = is an equivalence relation over a set M and m G M, 
then [m]= denotes the equivalence class of m. Accordingly, when M' C M, then 
M' /= denotes the set of all equivalence classes of elements from M' , that is, 
M' /= = {[m]= I m S M'}. 

Strings are sequences of letters over a finite alphabet. The length of a string u 
is denoted |u|. The i-th letter of a string u is denoted Ui, that is, m = rti . . .U|„|. 
The suffix of a string u starting at position i, the string Ui . . .U|„|, is denoted 
sufi(tt) and, similarly, prefixes are denoted prefj(rt). 

As usual. A* and A+ denote the set of all strings over A and all non-empty 
strings over A, respectively. 

2 Linear Temporal Logic 

In this section, the basics on linear temporal logic are recalled. For more on this, 
see [4] and [6]. 

2.1 Syntax 

A future linear temporal logic formula (LTL formula) over some finite set S of 
propositional variables is built from the boolean constant tt and the elements of 
S using boolean connectives and temporal operators: X (neXt), F (eventually in 
the future. Finally), and U (Until), where only U is binary and the others are 
unary. 



2.2 Semantics 

Given an LTL formula ip over some set S of propositional variables and a string 
u G (2'^)“'', we write u \= ip for the fact that u satisfies tp. In particular, 

— M fy tt, 

— u \= p A p G ui, 

— u \= ip \J tjj if there exists j with 1 < j < |m| such that sufy(it) \= ip and 
sufi(u) |= (p for all i with 1 < i < j. 

The semantics of X and F is derived from this: 



X(/3 = ^tt U ip 



Vip = ip . 



( 1 ) 
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With every formula over 17, we associate a language over 2'^: 

L(^) = {u e (2^)+ \u^^} . (2) 

We say L{ip) is defined by ip. 

When we want to define languages over arbitrary alphabets (rather than 
alphabets of the form 2^) we will use the following convention. We view A as a 
subset of some appropriate alphabet 2^. More formally, for each alphabet A we 
choose once and for all a set S of propositional variables such that 2l^l“^ < | A| < 
21'^ I and an injective mapping l: A — > 2^, which we extend to a homomorphism 
t: A* ^ (2^)*. For each formula tp G LTL^i (where the atomic formulas are 
letters from A), we set 



L{p) = {u G A+ I i{u) h , (3) 

where p' is obtained from p by replacing every occurrence of a letter a by 

f\ p/\ f\ . (4) 

pGi{a) p^i(a) 

As above, L{p) is the language defined by p. 

With every class of formulas we associate the class L{(p) of all languages 
defined by formulas in <P. The alphabet associated with a formula and the al- 
phabet a language is written over will always be implicit. When L G L{<P), we 
say L is expressible in 

Formulas p^fj G LTL^ are equivalent, denoted = '0, if L(p) = Lff). (This 
assumes that their alphabets are the same.) 

2.3 Fragments 

A fragment of LTL is determined by which temporal operators are allowed and 
possible restrictions on the nesting depth on these operators. When 9i,. . . ,9r 
are operators, then TL[6*i, . . . ,9r] is the fragment of LTL where only 9i, . . . ,9r 
are allowed. For instance, TL[X, F] is what Cohen, Pin, and Perrin call restricted 
temporal logic. 

Subscripts indicate bounds on nesting depth. For instance, in TL[Xfc] only X is 
allowed and the nesting depth in X is at most k. Another example is TL[X, F, Ufe], 
where X and F are allowed without restriction and the nesting depth in U must 
not be greater than k. This is also known as the fc-th level of the until hierarchy. 

3 Formal Languages and Semigroups 

In this section, the basics on formal languages and finite semigroups are recalled. 
See also [11], and for semigroup theory see [1]. 
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3.1 Semigroups and Congruences 

An equivalence relation = on is called a congruence over Aiiuv = u'v' when- 
ever u = u' and v = v' . A congruence relation has finite index if it has a finite 
number of equivalence classes. A congruence over some alphabet A saturates a 
formal language if it is a union of congruence classes. 

Given a language L C A~^, its syntactic congruence, denoted =l, is the 
equivalence relation on A+ defined by 

u=L V iff Vcc, y S A'^ixuy G L ^ xvy G L) , 

which is, in fact, an equivalence relation. It is the coarsest congruence satu- 
rating L and has a finite index if and only if L is regular. If L is regular and 
(A, Q, qi, 6 , F) is the minimal DFA recognizing L, then 

u=lv iff 'iq G Q{5~^{q,u) G F ^ 5'^{q,v) G F) , 

where 5“*" is the extended transition function, obtained by extending 6 according 
to 5~^{q, xa) = 5{5^{q, x), a). 

A semigroup is a set equipped with an associative product. The free semi- 
group over a set A is the set A+ with concatenation as product. A homomor- 
phism h : A+ — > S into a semigroup induces an equivalence relation =h defined 
by u = if h{u) = h(v). Conversely, a congruence relation = over some alpha- 
bet A induces a homomorphism /i= : A~^ — > A~^/= into the quotient semigroup 
A+/= defined by h{u) = [u] = . 

A language L over A is recognized by a homomorphism h : A~^ — > S' if L is 
saturated by =h, that is, when L = h~^{P) for some set P C S. It is recognized 
by a semigroup S if there exists a homomorphism h : A~^ S which recognizes 
it. The syntactic semigroup of a language L C A+, denoted S(L), is defined by 
S{L) = A^ j=i^. Clearly, L is recognized by h=^ and S(V) and L is regular iff 
s\V) is finite. 

3.2 Pseudovarieties of Semigroups 

Often enough, interesting classes of regular languages (such as all LTL expressible 
languages) are closed under boolean combinations, left and right quotients (if L 
belongs to the class, then also a~^L = {u \ au G L} and La~^ = {u \ ua G L}), 
and inverse homomorphic images. Such classes are called varieties of formal lan- 
guages. According to Eilenberg’s theorem [3], varieties of formal languages are 
in a one-to-one correspondence with pseudovarieties of finite semigroups, which 
are classes of finite semigroups closed under finite direct products, homomor- 
phic images and subsemigroups. More precisely, the following two mappings are 
inverse to each other: (1) the mapping that associates with every variety L of 
formal languages the pseudovariety of semigroups generated by the syntactic 
semigroups of the languages in L, and (2) the mapping that associates with ev- 
ery pseudovariety V of finite semigroups the class of formal languages recognized 
by elements of V (which is a variety of formal languages) . 
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So studying the corresponding pseudovariety of semigroups is as good as 
studying a given variety of languages. This is true even when it comes to com- 
putational issues, because constructing the syntactic semigroup of a regular lan- 
guage from a representation by a finite automaton or a regular expression is 
effective. In particular, in order to show that membership to a given variety 
of formal languages is decidable it is enough to show that membership to the 
corresponding pseudovariety of semigroups is decidable. And the latter is, often 
enough, (much) easier. 

Decidability for pseudovarieties of semigroups has been studied for quite a 
while. In certain situations, decidability is obvious, for instance, when a pseu- 
dovariety is defined by a finite set of equations. But in other situations it can be 
very difficult. In fact, there are quite a view pseudovarieties where decidability 
is an open problem. Fortunately, showing decidability for the pseudovarieties 
related to LTL is not that difficult. 

We will use the following terminology when working with congruences. If the 
quotient semigroup of a congruence relation = over some alphabet A belongs to 
a pseudovariety V, we will say that = is a V -congruence. 

4 Simple Fragments 

We first study fragments of temporal logic which are easy to characterize, but 
are non-trivial nevertheless. Here, easy means that it is almost straightforward 
to come up with a finite set of equations (in fact, a single equation) for the 
corresponding pseudovariety of semigroups. 

4.1 Next Only 

We first look at the LTL fragment where only X is allowed, that is, we study 
TL[X]. Observe that although in TL[X] one cannot express complicated proper- 
ties, even over a fixed alphabet there are infinitely many different properties that 
can be expressed. So finding an effective characterization is not a trivial issue. 

For a start, we consider TL[Xfc] for increasing k. The formulas in TL[Xq] are 
the propositional formulas, and with such a formula we can only speak about the 
first position of a string. In TL[Xi] we can also speak about the second position. 
We can, for instance, say that in the second position there is a certain letter 
(Xa), and we can say that there is no second position (^Xtt). In TL[X 2 ], we can 
also speak about the third position of a string . . . Algebraically, this means that 
in the syntactic semigroup of a language expressible in TL[Xfc] any two products 
of at least fc -I- 1 elements which agree on the first fc -I- 1 factors are the same: 

Theorem 1. Let k > 0 and L C A+ a regular language. Then the following are 
equivalent: 

1. L is expressible in TL[Xfe]. 

2. S{L) satisfies 

(Xk) 



Si . . . Sk+lt — Si . . . Sfe+1 . 



102 



Thomas Wilke 



3. L is saturated by the eongruence =k defined by u =k v if u = v or |u|, |t!| > 
k+1 and prefj,_|_i(M) = preffc_|_i(t>). 

Proof. That 2. and 3. are equivalent is easy to see: the quotient semigroup 
A+/=fc satisfies {X^), and if a semigroup S satisfies (X^) and ft.: S' — > A+ is 
a homomorphism, then =k C =f^. 

To complete the proof we show the equivalence between 1. and 3. For the 
implication from 3. to 1. we only need to show that every equivalence class of 
=k is expressible in TL[Xfc]. This is simple: if |u| < ft, then which contains 

just u, is defined by Ui AX(u 2 AX(u 3 A • • • AX(u|„| A ^Xtt))); if |m| > ft + 1, then 
[u]=^ is defined by ui A X(u 2 A X(u 3 A • • • A Xm^+i)). 

The proof of the implication from 1. to 3. goes by induction on k. The base 
case, fc = 0, is simple: a language L C is expressible in TL[Xq] if L = BA* for 
some B C A, and all these languages are saturated by =k- In the inductive step, 
assume we are given a formula (p G TL[Xfc_|_i], Then is a boolean combination 
of propositional formulas and formulas of the form Xfj where G TL[Xfe], So, 
by induction hypothesis, L(p) is a boolean combination of languages of the 
form BA* and A\v\=^. where B C A and u G A~^ is a =fc-class. Clearly, each 
language of the first type is saturated by =k+i- Moreover, it is easy to see 
that A[u]=^ = UaGA[®^]=fc+F hence, every language of the second type is also 
saturated by =k+i- □ 

Example 2. Consider L = abA* where A = {a, b}. The syntactic congruence of L 
has 4 classes: a, aaA* ^ bA*, abA*. The multiplication table of S{L) is: 





a b aa ab 


a 


aa ab aa aa 


b 


b b b b 


aa 


aa aa aa aa 


ab 


ab ab ab ab 



where the elements are identified with shortest representatives of the correspond- 
ing equivalence classes. Clearly, (Xq) is not satisfied, because aa is different 
from a. On the other hand, it is easy to see that (Xi) is satisfied. So L should 
be definable by a TL[Xi] formula. It is simple to come up with such a formula: 
a A Xft. □ 

Clearly, a characterization of the pseudovariety corresponding to full TL[X] 
cannot be given in terms of a finite number of equations. In order to arrive at 
an effective characterization nevertheless, we allow a new operation to be used 
in equations — currently, in equations we only use the binary products of the 
underlying semigroups. 

Let S' be a finite semigroup. Then, for every s G S, there is a unique element 
e G {s, s^, s^, . . . } such that = e. That is, in the subsemigroup generated 
by s there is a unique idempotent element. This will be denoted by s“. Clearly, 
when S and s are given, then s“ can be determined easily. So if we allow (.)“ 
to be used in equations for defining classes of finite semigroups — this is what we 
do from now on — , then these equations can still be verified effectively. 
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The key observation for finding an equation for full TL[X] is that in a finite 
semigroup idempotent elements represent “long” strings. More precisely, in a 
finite semigroup S every element which can be written as a product of at least 
[S'! elements can also be written in the form s'es" where e is idempotent (for 
any product si . . . s„ consider the partial products si, S 1 S 2 , ... and use the 
pigeon-hole principle). Using this, we obtain: 

Lemma 3. A finite semigroup satisfies one of the equations (Xk) if and only if 
it satisfies 



s^t = s‘^ . (X) 

Proof. Let S' be a finite semigroup. If S satisfies (X^), then S satisfies (X): we 
have s'^t = where the middle equality is due to {Xk). 

Conversely, in a semigroup S with n elements we have si . . . SnV = s' es" y = 
s'e^s"y = s' e^ = s'e'^s" = si . . . s„ for appropriate elements s',s" € S, e 
idempotent, and any choice of si, . . . , s„ S S. □ 

Example f. Consider L from Example 2 again. Clearly, 6, aa, and ab are the 
only idempotent elements, and all of them are left zeroes. So (X) is satisfied. □ 

We conclude with a summary of the results: 

Theorem 5. 1. For every k, the fragment TL[Xfc] is effectively characterized 

by (Xk). 

2. The fragment TL[X] is effectively characterized by (X). 

4.2 Eventually Only 

In this section, we look at a more complicated class of temporal formulas. We 
want to characterize the languages that are expressible in TL[F]. 

With nesting depth 0 one can specify that a string starts with a certain letter. 
With nesting depth 1 one can specify in addition that certain letters occur after 
the first position (Fa) and others don’t (^Fa). With nesting depth 2 one can, for 
instance, specify in addition that after the first position a certain letter occurs 
and after that one certain letters do not occur any more (F(a A ^Fai A . . . ^Fa„)) 
and other related properties. 

We want to phrase algebraically what all the above properties have in com- 
mon. Assume we are given a set of properties that can be expressed in TL[F], 
say the set of all properties of nesting depth at most k. Suppose a string of the 
form u and one of its suffixes, say sufi(r() satisfy exactly the same subset <!>' of T>. 
Then it should be clear that every string “between” u and sufi(u) starting with 
the same letter as u and v, that is, every sufj(u) with 1 < j < i and ui = uj 
(= Ui) should satisfy exactly <T' . 

This can be phrased very easily in terms of a universal Horn sentence (and 
in terms of an equation): 
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Theorem 6. Let L C A~^ be a regular language. Then the following are equiva- 
lent. 

1. L is expressible in TL[F]. 

2. S{L) satisfies 



stu = u — > rtu = ru . (5) 

( Observe that on the right-hand side we could also write rstu = rtu = ru.) 

3. S{L) satisfies 



rs(is)“ = r(ts)'^ . (6) 

In the proof, we use the following equivalence relation, which parametrizes 
the TL[F] expressible properties. For every fc, let =k be the congruence with 
u =k V \i u and v satisfy the same TL[Ffc] properties. This equivalence relation 
is in fact a congruence relation. We will use the following inductive description 
of =k' 

Lemma 7. Let u and v be arbitrary non-empty strings over A. Then: 

1. u=ov iffui = vi. 

2. u =k+i V iff 

(a) ui = vi, 

(b) for every i with 1 < i < |m| there exists j with I < j < |f| such that 
sufi(rt) =k sufy(r;), and 

(c) for every i with 1 < i < |r!| there exists j with 1 < j < |u| such that 

suffyti) =k sufy(u). □ 

This can be proved by a straightforward induction on k; it simply formalizes 
what was informally described at the beginning of this subsection. 

Proof of Theorem 6. It is easy to see that (5) and (6) are equivalent for every 
finite semigroup. First, assume (5) holds. Let s and t be arbitrary semigroup 
elements and m such that (ts)"* = {ts)^ . Then = (ts)"*, so, 

using (5), rs{ts)^ = r{ts)^, hence rs{ts)‘^ = r(ts)“. Conversely, if stu = u, 
then {sf)™‘u = u for every m > 1, in particular, (st)‘^u = u, which implies 
rt{sf)‘^u = ru. But t{st)^u = tu anyway, so rtu = ru. 

To conclude the proof we show that 1. and 2. are equivalent. The proof 
that 1. implies 2. is by induction on k. It is enough to show that the quotient 
semigroups j=k satisfy (5). For fc = 0, this is trivial. So let > 0 and assume 
stu =k u. Clearly, rtu and ru start with the same letter, so 2. (a) from Lemma 7. 
Let (fi € TL[Ffe_i]. First, suppose suffyrit) \= (p for some i with 1 < z < \ru\. 
We distinguish two cases. If i < |r|, then r'u \= p ior a, suffix r' of r, and the 
induction hypothesis yields r'tu |= which shows suii(rtu) \= (p. Hi > |r|, then 
sufi_p|(u) 1= so suf (tu) 1= p, which also means suf^+iq (rtu) |= p. 
So, in both cases, sufy(rtu) \= p for some j with 1 < j < \rtu\, that is, 2.(b) 
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from Lemma 7. In the same fashion, one proves that 2.(c) holds. This shows 
rtu =k ru. 

For the implication from 2. to 1., we again use the above characterization of 
=fe. Assume (5) does not hold for the syntactic semigroup of some language L. 
Then there exist strings r, s, t, u such that stu u and rtu t'u. As a 
consequence, we have rt{st)^u r{st)^u for every k. By definition of =l, this 
means that for every k there exist x,y € A* such that xrt{st)^uy G L and 
xr{st)^uy ^ L, or xrt{st)^uy ^ L and xr(st)^uy G L. By a simple induction, 
one shows xrt{st)^uy =k xr{st)^uy. So no formula in TL[Ffe] can define L. Since 
this is true for every A:, L cannot be expressed in TL[F]. □ 

Example 8. Consider L = A*aA*b where A = {a, 6, c}. The classes of =l 
are: A*a, A*aA*b, (6 + c)*b, {b + c)*c, A*aA*c. And the multiplication table 
of S{L) is: 





a 


b 


c 


ab 


ac 


a 


a 


ab 


ac 


ab 


ac 


b 


a 


b 


c 


ab 


ac 


c 


a 


b 


c 


ab 


ac 


ab 


a 


ab 


ac 


ab 


ac 


ac 


a 


ab 


ac 


ab 


ac 



The elements a, ab, and ac are right zeros, so it is impossible to violate (6) with 
ts G {a,ab,ac}. But (ts)“ ^ {a,ab,ac} iff t, s S {b,c}, and in the subsemigroup 
{b,c}, b and c are right zeros. So S{L) satisfies (6). This is what we expect, 
because L is defined by (a A F6) V F(a A F&). □ 

Example 9. Consider the language L from Example 2. The second letter of any 
word in L is fixed to b. This should not be expressible in TL[F]. Our decision 
procedure yields the expected result: in S{L), we have aa = aa(6a)“ yf a(6a)“ = 
ab, thus (6) does not hold in S{L). □ 

5 Composing Classes 

Many fragments of LTL are more difficult to characterize than TL[X] and TL[F], 
for instance, TL[X, F], which is known as restricted temporal logic. For fragments 
like this one, a compositional approach should be taken. The key concepts are 
substitution on the LTL side and semidirect products on the semigroup side. 

5.1 Substitution 

It is quite easy to see that every formula in TL[X, F] is equivalent to a formula 
in that fragment where F does not occur in the scope of X. So X can always be 
pushed inside. We describe this more formally using substitution. 

A function a: LTL^ — > LTLp is a substitution if a maps each formula ip G 
LTL^ to the formula which is obtained from ip by replacing every occurrence of a 
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propositional variable p ^ Shy (r{p). Note that a substitution is fully determined 
by the values for the propositional variables. 

When •f' is a class of LTL formulas, then a: LTL^ ^ LTL^ is a If" substitution 
if a{p) G 'F for every p G S. When ^ and S' are classes of formulas, then 
o >F is the set of all formulas (j{ip) where p G is a, formula over some S 
and cr: LTLi; ^ LTLr is a If" substitution for some F. Clearly, substitution is 
associative, so it is not necessary to use parentheses. 

The above remark about TL[X, F] can now be rephrased as follows. Every 
formula in TL[X, F] is equivalent to a formula in TL[F] o TL[X]. 

The most basic lemma about the semantics of LTL is the analogue of the 
substitution lemma for first-order logic, which is described in what follows. Let 
cr: LTLi; ^ LTLr be a substitution. For each u G of length n, let a{u) be 

the string oi . . . a„ with 



at = {pG S \ sufi(it) 1= a{p)} . (7) 

So the i-th letter of cr(u) encodes which formulas cr(p) hold in sufi(rt) and which 
don’t. 

? 

The substitution lemma now splits the problem u |= u{ip) into two problems: 

Lemma 10 (LTL substitution lemma). Let p be an LTL formula over S 
and a : Ills LTLr a substitution. Then, for every u G (2^) + , 

u h cr((p) iff a{u) 1= . (8) 

□ 

The proof of this lemma is a simple induction. 

Example 11. Let ip = F(a A X6) where A = {a, 6}, that is, L{ip) = A'^abA*. 
Identify A with 2^ where F = {g}, in particular, assume a = {g} and 6 = 0. Let 
S = {p}. Then ip = cr(Fp) with a: p a A X6, more precisely, a{p) = g A X^g. 
Clearly, bbbaba ^ tp. Using the above lemma, we can verify this as follows. First, 
a{bbbaba) = 000{p}00, and the latter satisfies Fp. □ 

5.2 Semidirect Product /Substitution Principle 

Suppose we are given two LTL fragments and F and effective characteriza- 
tions in terms of finite semigroups, that is, we are given two pseudovarieties V 
and W such that L{<1>) = L(V) and L(F) = L{W). We would like to have a 
characterization oFFoF. To obtain a good picture of what this operation should 
do, we reconsider the substitution lemma. 

There is a straightforward way to model cr(u) in an algebraic setting. As- 
sume we are given a congruence = that describes somehow the semantics of 
the formulas cr(p), say it saturates all the languages defined by these formulas. 
Further, assume u is a non-empty string over A. Then the string (j(u) somehow 
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corresponds to the string u= defined by m= = oi . . .a\u\ where = [sufi(x)]=. 
Note that u= should strictly be viewed as a string over the semigroup j=. 

From the above it is clear it would be desirable to have an algebraic operation 
at hand which given pseudovarieties V and W yields a new pseudovariety gen- 
erated by all quotients of the form A+/="' where ="' is a congruence relation 
generated by an equivalence relation =" obtained as follows. 

(t) For some VF-congruence = over A and some V-congruence =' on (5'+/=)+, 
the relation =" relates u and v if u= =' v=. 

This would allow us to combine V and W in a way such that the resulting 
pseudovariety corresponds to L{'P o <F). 

Example 12. Reconsider Example 11. Let = denote the syntactic congruence of 
L(a A X6), which we already know from Example 2. Suppose u = aabbba and 
V = bbbabab. Then u= = [aa] [o6] [6] [6] [6] [a] and v= = bbbabbab. Let =" be the 
syntactic congruence of the language S{L)[ab]S{L)* . Then u =" v. □ 

As it turns out, there is an algebraic operation on pseudovarieties that is very 
close to what we need: the pseudovariety usually denoted {V^ * W^y. Here, 
(.)'’ means that the multiplication tables are tranposed, and * stands for the 
semidirect product of pseudovarieties (see below). 

The exact description of (V'’ * W^Y is as follows [16]. It is the pseudovariety 
generated by semigroups A~^ /=" where =" is a congruence obtained as follows. 

(I) For some IF-congruence = over A and some V-congruence =' over A x 
A+/=, the relation = relates u and v if 

1. u = V and 

2. for every x G A*, itL =' riL with wA defined by w% = a\ . . . a|^| where 
Qi = {wi, [wx]=). 

(In the above, special care has to be taken when wx is the empty string.) 

Observe that (f) and (|) deviate only little from each other, but the minor 
difference in the two has some consequences, which will be explained in what 
follows. 

We need some more notation. We write L'{V) for the class of all L C A+ 
such that L is finite union of languages of the form aL' with a G A and L' 
recognized by a monoid homomorphism A~^ — > with S G V . Here, is S 

augmented by a neutral element if S contains no neutral element and else S. 
And we write <P o' E for the set of all boolean combinations of formulas from 
{(p oE)U E. With this notation, we get: 

Theorem 13 (semidirect product substitution principle [17]). LetP and 
E he classes of LTL formulas and V and W he pseudovarieties of semigroups 
such that L{<P) = L{V) and L(fP) = L'fW). Then 

L{<P o' E) = L{V *P W) , (9) 



where W *p V = {V^ * W^y. 
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One simple application is given in the following example. 

Example 14- We want to derive an effective characterization of TL[X, F]. We 
already know that this amounts to obtain an effective characterization of TL[F] o 
TL[X]. Clearly, TL[F]o'TL[X] is at least as expressive as TL[F]oTL[X] and not more 
expressive than TL[X, F]. So, by the semidirect product substitution principle, 
we get 

L(TL[X, F]) = i(L' K) , (10) 

where L' is defined by (6) and K by (X). □ 

It follows from results in semigroup theory [15] that the above pseudovariety 
is decidable, which means we have an effective characterization of TL[X, F]. 

A first effective characterization was originally obtained by Cohen, Perrin, 
and Pin [2] without using the semidirect product substitution principle. They 
showed 

T(TL[X,F]) = T(LL) (11) 

where LL is the pseudovariety of locally L-trivial semigroups defined by 

r‘^s(r“tr‘^s)‘^ = (r“tr-“s)“ , (12) 

which, by Eilenberg’s theorem, is the same as L' K. 

5.3 Semidirect Product 

For the sake of completeness, we conclude this subsection with a formal defini- 
tion of the semidirect product of two semigroups. Given semigroups S and T 
and a monoid homomorphism h: ^ End(S') from to the monoid of en- 

domorphisms of S, the semidirect product of S with T (with respect to h), 
denoted S *h T, is the set S' x T with multiplication defined by (si, ti)(s 2 , ^ 2 ) = 
(siS 2 (tih),tit 2 ). The interested reader may want to check how this relates with 
the constructions in (f) and (|). 

Given pseudovarieties V and W of semigroups, their semidirect product is 
the pseudovariety of semigroups generated by all semigroups of the form S *h T 
where S G V , T G W, and h: ^ End(S) is a monoid homomorphism; it is 

denoted by V * W. This product is associative whereas the semidirect product 
on individual semigroups is not. 

6 Conclusion 

Using the semidirect product substitution principle one can easily characterize 
many other fragments of LTL. For instance, one gets a parametrized version of 
Theorem 6: 



L(TL[Ffe]) = L(Sl*^---*^Sl) , 



(13) 
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where the product has k factors and SI is the pseudovariety of semilattices 
defined by 



st = ts , = s . (14) 

The most interesting question is to classify an LTL property according to how 
deeply U, the only binary operator, needs to be nested to express the property. 
In other words, one is interested in an effective characterization of the classes 

L(TL[X, F, UoD C L(TL[X, F, Ui]) C L(TL[X, F, U 2 ]) C • • • , (15) 

which is known as the until hierarchy of linear temporal logic. In Example 14, 
we have seen how level 0 can be characterized. In [17] it is shown that 

L(TL[X, F, Ufcj) = L(L' *p MDi'’ ■ ■ ■ *p MD/ *p K) (16) 

with k factors MDi where MDi is the pseudovariety generated by the three 
element monoid {1, a, 6} determined by ax = a and 6x = 6 for all x G {1, a, b}. 

All the above characterizations are effective, that is, the respective pseu- 
dovarieties are decidable. The decidability proofs are quite involved; they rely, 
for instance, on results by Straubing [15] and Steinberg [14]. 

We have seen that decidability of fragments of LTL can be reduced to de- 
cidability of iterated semidirect products of pseudovarieties of semigroups. In 
general, it is not at all clear whether products of decidable pseudovarieties are 
decidable. For the products that are relevant to characterizing fragments of LTL 
the known decidability criteria — deep results from semigroup theory — yield pos- 
itive results. These criteria are, however, quite involved. Especially, they don’t 
allow to prove reasonable upper bounds on the complexity of the decision prob- 
lems. 
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Abstract. We establish refined search tree techniques for the param- 
eterized DOMINATING SET problem on planar graphs. We derive a hxed 
parameter algorithm with running time 0(8*n), where k is the size of 
the dominating set and n is the number of vertices in the graph. For our 
search tree, we firstly provide a set of reduction rules. Secondly, we prove 
an intricate branching theorem based on the Euler formula. In addition, 
we give an example graph showing that the bound of the branching the- 
orem is optimal with respect to our reduction rules. Our final algorithm 
is very easy (to implement); its analysis, however, is involved. 



Keywords, dominating set, planar graph, fixed parameter algorithm, search tree 

1 Introduction 

The parameterized dominating set problem, where we are given a graph 
G = (V,E), a parameter k and ask for a set of vertices of size at most k that 
dominate all other vertices, is known to be IT[2]-complete for general graphs [8]. 
The class W[2] formalizes intractability from the point of view of parameter- 
ized complexity. It is well-known that the problem restricted to planar graphs is 
fixed parameter tractable. An algorithm running in time O(ll^'n) was claimed 
in [7,8]. The analysis of the algorithm, however, turned out to be flawed; hence, 
this paper seems to give the first completely correct analysis of a fixed parame- 
ter algorithm for dominating set on planar graphs with running time 0{c^n) 
for small constant c that even improves the previously claimed constant con- 
siderably. We mention in passing that in companion work various approaches 
that yield algorithms of running time 0{c^n) for planar dominating set 
and related problems were considered (see [1,2,3]).^ Interestingly, very recently 

* Supported by the Deutsche Forschungsgemeinschaft (DFG), research project PEAL 
(Parameterized complexity and Exact ALgorithms), NI 369/1-1. 

^ The huge worst case constant c that was derived there is rather of theoretical interest. 
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it was shown that up to the constant c this time bound is optimal unless a very 
unlikely collapse of a parameterized complexity hierarchy occurs [4,5]. 

Fixed parameter algorithms based on search trees. A method that has 
proven to yield easy and powerful fixed parameter algorithms is that of con- 
structing a bounded search tree. Suppose we are given a graph class Q that is 
closed under taking subgraphs and that guarantees a vertex of degree d for some 
constant Such graph classes are, e.g., given by bounded degree graphs, or 
by graphs of bounded genus, and, hence, in particular, by planar graphs. More 
precisely, an easy computation shows that, e.g., the class Q{Sg) of graphs that 
are embeddable on an orientable surface Sg of genus g guarantees a vertex of 
degree dg := [2(1 -|- + 1)] for g > 0, and do := 5. 

Consider the /c-independent set problem on Q, where, for given G = 
{V, E) G Q, we seek for an independent set of size at least k. For a vertex u 
with degree at most d and neighbors N{u) := {iti, . . . we can choose one 
vertex w G iV[it] := {u, u\, . . . , Ud} to be in an optimal independent set and 
continue the search on the graph G' where we deleted A^[icj. This observation 
yields a simple 0((d -I- l)^n) degree-branching search tree algorithm. 

In the case of /c-dominating set, the situation seems more intricate. Clearly, 
again, either u or one of its neighbors can be chosen to be in an optimal dom- 
inating set. However, removing u from the graph leaves all its neighbors being 
already dominated, but still also being suitable candidates for an optimal dom- 
inating set. This consideration leads us to formulate our search tree procedure 
in a more general setting, where there are two kinds of vertices in our graph. 

Annotated Dominating Set 

Input: A black and white graph G = (H l+l W, F), and a positive integer k. 
Parameter: k 

Question: Is there a choice of at most k vertices V GY = BGiW such that, for 
every vertex u G B, there is a vertex u' G N[u] C V? In other words, is there a 
set of k vertices (which may be either black or white) that dominates the set of 
all black vertices? 

In each step of the search tree, we would like to branch according to a low 
degree black vertex. By our assumptions on the graph class, we can guarantee 
the existence of a vertex u G B W with deg(u) < d. However, as long as not 
all vertices have degree bounded by d (as, e.g., the case for graphs of bounded 
genus g, where only the existence of a vertex of degree at most dg is known), this 
vertex need not necessarily be black. These considerations show that a direct 
0{{d + l)^n) search tree algorithm for dominating set seems out of reach for 
such graph classes. 

Our results. In this paper, we present a fixed parameter algorithm for (anno- 
tated) dominating set on planar graphs with running time 0(8^n). For that 
purpose, we provide a set of reduction rules and, then, use a search tree in which 
we are constantly simplifying the instance according to the reduction rules (see 

This means that, for each G = (V,E) G G, there exists a, u gV with deg(j(M) < d. 
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Subsection 3.1). The branching in the search tree will be done with respect to 
low degree vertices. The analysis of this algorithm will be carried out in a new 
branching theorem (see Subsection 3.2) which is based on the Euler formula for 
planar graphs. In addition, we give an example showing that the bound of the 
branching theorem is optimal (see Section 3.3). Finally, it is worth noting here 
that the algorithm we present is very simple and easy to implement. 

Due to the lack of space, several proof details had to be omitted. 

2 Preliminaries 

We assume familiarity with basic notions and concepts in graph theory, see, 
e.g., [6]. For a graph G = (V,E) and a vertex m G F, we use N{u) and N[u], 
respectively, to denote the open and closed neighborhood of u, respectively. By 
degfj(u) := |A^g(w)|, we denote the degree of the vertex m in G. A pendant vertex 
is a vertex of degree one. For V C F, the induced subgraph of V is denoted 
by G[W]. In particular, we use the abbreviation G — V := G\V \ V'\. If V is 
a singleton, then we omit brackets and simply write G — v for a vertex v. In 
addition, we write G — e or G + e when we delete or add an edge e to G without 
changing the vertex set of G. 

Let G be a connected planar graph, i.e., a connected graph that admits 
a crossing-free embedding in the plane. Such an embedding is called a plane 
embedding. A planar graph together with a plane embedding is called a plane 
graph. Note that a plane graph can be seen as a subset of the Euclidean plane . 
The set \ G is open; its regions are the faces of G. Let if be the set of faces 
of a plane graph. The size of a face F € J- is the number of vertices on the 
boundary of the face. A triangular face is a face of size three. If G is a plane 
graph and V C F, then G\V'] and G —V can be always considered as plane 
graphs with an embedding inherited by the embedding of G. 



3 The Algorithm and Its Analysis 

Our algorithm is based on reduction rules (see Subsection 3.1) and an improved 
branching theorem (see Subsection 3.2). With respect to our set of reduction 
rules, we show optimality for the branching theorem (see Subsection 3.3). 



3.1 Reduction Rules 

We consider the following reduction rules for simplifying the annotated pla- 
nar DOMINATING SET problem. In developing the search tree, we will always 
assume that we are branching from a reduced instance (thus, we are constantly 
simplifying the instance according to the reduction rules). ^ When a vertex u is 

® The idea of doing so-called rekernelizations (i.e., repeated application of reduction 
rules) while constructing the search tree was already exhibited in [9,10] in a somewhat 
different context. 



114 



Jochen Alber et al. 



placed in the dominating set H by a reduction rule, then the target size k for D 
is reduced to fc — 1 and the neighbors of u are whitened. 

(Rl) Delete edges between white vertices. 

(R2) Delete a pendant white vertex. 

(R3) If there is a pendant black vertex w with neighbor u (either black or 
white), then delete w, place u in the dominating set, and lower k to k — 1. 
(R4) If there is a white vertex u of degree 2, with two black neighbors ui and U 2 
connected by an edge {ui, U 2 }, then delete u. 

(R5) If there is a white vertex u of degree 2, with black neighbors Mi,M3, and 
there is a black vertex U 2 and edges {mi, U 2 } and {^2,^3} in G, then delete u. 
(R6) If there is a white vertex u of degree 2, with black neighbors mi,M3, and 
there is a white vertex U 2 and edges {mi, U 2 } and {u2, M3} in G, then delete u. 
(R7) If there is a white vertex u of degree 3, with black neighbors mi, M 2, M3 for 
which the edges {mi,M2} and {m2,M3| are present in G (and possibly also 
{mi,M3}), then delete u. 

Let us call a set of simplifying reduction rules of a certain problem sound 
if, whenever {G,k) is some problem instance and instance {G',k') is obtained 
from (G, k) by applying one of the reduction rules, then (G, k) has a solution iff 
(G', fc') has a solution. A simple case analysis shows: 

Lemma 1. The reduction rules are sound. □ 

Suppose that G is a reduced graph, that is, none of the above reduction rules 
can be applied. By using the rules (Rl), (R2), (R4) and (R7), we can show: 

Lemma 2. Let G = (RWVL, E) be a plane black and white graph. If G is reduced, 
then the white vertices form an independent set and every triangular face ofG[B] 
is empty. □ 

3.2 A New Branching Theorem 

Theorem 3. If G = (RWW, E) is a planar black and white graph that is reduced, 
then there exists a black vertex u € B with degg(M) < 7. 

The following technical lemma, based on an “Euler argument,” will be needed. 
Note that if there is any counterexample to the theorem, then there is a con- 
nected counterexample. 

Lemma 4. Suppose G = (BbiW, E) is a connected plane black and white graph 
with b black vertices, w white vertices, and e edges. Let the subgraph induced by 
the black vertices be denoted H = G[B]. Let ch denote the number of components 
of H and let fn denote the number of faces of H . Let 

z = (3(& -I- m;) — 6) — e (1) 

measure the extent to which G fails to be a triangulation of the plane. If the 
criterion 

3w — 4b — z + fn — Ch < 7 (2) 

is satisfied, then there exists a black vertex u G B with degg(M) < 7. 
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Proof. Let the (total) numbers of vertices, edges and faces of G be denoted v, e, f 
respectively. Let ebw be the number of edges in G between black and white, and 
let ebb denote the number of edges between black and black. With this notation, 
we have the following relationships. 



V — e-\- f = 2 


(Euler formula for G) 


(3) 


V = b-\- w 




(4) 


e — ^bb T ^bw 




(5) 


b — ebb + f H = 4 + ch 


((extended) Euler formula for H) 


(6) 


II 

1 

1 


(by Eq. (1), (4), and (5)) 


(7) 


If the lemma were false, then we 


would have, using (5), 




86 ^ ~h Cfeit) — Cfcb G. 


(8) 



We will assume this and derive a contradiction. The following inequality holds: 



3 + CH = v + b- {ebb + e) + f + fn (by (3) and (6)) 

<v + b-8b+f + fH (by (8)) 

= 3v -7b+ fn - 4:- z (by (7)) 

= 3w - 4b+ fn - 4- z. (by (4)) 

This yields a contradiction to 2. □ 



Proving Theorem 3 by contradiction, it will be helpful to know that a corre- 
sponding graph has to be connected and has minimum degree 3. 

Lemma 5. If there is any counterexample to Theorem 3, then there is a con- 
nected counterexample where degQ(w) > 3 for all u G W. 

Proof. Suppose G is a counterexample to the theorem. Then, G does not have 
any white vertices of degree 1, else reduction rule (R2) can be applied. Let G' be 
obtained from G by simultaneously replacing every white vertex u of degree 2 
with neighbors x and y by an edge {x,y}. The neighbors x and y of u are 
necessarily black, else (Rl) can be applied, and in each case the edge {x,y} 
is not already present in G, else rule (R4) would apply. We argue that G' is 
reduced. If not, then the only possibility is that reduction rule (R7) applies to 
some white vertex u of degree 3 in GL If rule (R7) did not apply to u in G, 
then one of the edges between the neighbors of u must have been created in our 
derivation of G' from G, i.e., one of these edges replaced a white vertex u' of 
degree 2. But this implies that reduction rule (R6) could be applied in G to u', 
contradicting that G is reduced. □ 

Before giving the proof of Theorem 3, we introduce the following notation: 

Notation: Let G = (R l±l W, if) be a plane black and white graph and let T be 
the set of faces of G[B]. Then, for each F G J-, we let 

• wp denote the number of white vertices embedded in F, 
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• zp denote the number of edges that would have to be added in order to 
complete a triangulation of that part of the embedding of G contained in F, 

• tp denote the number of edges needed to triangulate F in G[B] (that is, 
triangulating only between the black vertices on the boundary of F, and 
noting that the boundary of F may not be connected), and 

• cp denote the number of components of the boundary of F, minus 1. 

Observe that the numbers zp and tp are indeed well-defined. This can be shown 
by using [6, Proposition 4.2.6]. 

Proof ( of Theorem 3). We can assume that if there is a counterexample G then G 
is connected, but the black subgraph H := G\B] might not be connected. More- 
over, by Lemma 5 we may assume that degfj(u) > 3 for all u S W. If ch denotes 
the number of components of H, by induction on cp, it is easy to see that 
c// — 1 = '^p^pCp. Also, if z is the number of edges needed to triangulate G, 
we clearly get z = '^p^pZp. The criterion (2) established by Lemma 4 can be 
rephrased as 

3 ^ wp — ^ Zp — 4b + fp — Ch < 7, 

FG.F fgf 

which is equivalent to 

3 'y ] {wp cp/3 — Zp I'i 1/3) — 4b — 2cp < 6. 

fgf 

Now, assume that we can show the inequality 

Wp + cp/3 — Zp /3 + 1/3 < atp + P (9) 

for some constants a and P and for every face F of the subgraph FI. Call this 
our linear bound assumption. Then, criterion (2) will hold if 

3 {atp + P) — 4b— 2cp = | 3a j + ( 3/3 1 j —4b— 2cp < 6. 

fgf V fgf / V fgf / 

Noting that '^p^ptp is the number of edges needed to triangulate FI, we have 

^ tF = 35 - 6 - Cbb- 
fgf 

The number of faces of FI is X^fsfI = = Cbb — b + 1 + cp, by Euler’s 

formula (7). Together, these give us the following targeted criterion: 

3a(35 — 6 — Cbb) + 3P{cbb — 5 -I- 1 -I- cp) — 45 — 2cp < 6. 

Multiplying out and gathering terms, we need to establish (using the linear 
bound assumption), that 



5(9a — 3P — 4) + ebb{3P — 3a) -|- 3/3(1 -I- cp) — 18a — 2cp < 6. 
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This inequality is easily verified for a = /3 = 2/3. 

To complete the argument, we need to establish that the linear bound as- 
sumption (9) with a = (3 = 2/3 holds for faces of reduced graphs, i.e., that 

WiT -|- cj’/3 — Zir/3 < 2tf /3 -l- 1/3. (10) 

But this is a consequence of the following Propositions 6 and 8. □ 

Proposition 6. Let G = {B W, E) he a reduced plane black and white graph 
and let F be a face ofG[B], Then, using the notation above, we have wp + cp < 

Zp + 1 . 

Proof. Consider the “face-graph” Gp := G[Bp U Wp], where Bp is the set of 
black vertices forming the boundary of F and Wp is the set of white vertices 
inside F. Note that Gp may consist of several “black components,” connected 
among themselves through white vertices. Contracting each of these black com- 
ponents into one (black) vertex, we obtain the bipartite graph G'p. Note that both 
the black and also the white vertices form independent sets in G'p. Clearly, G'p 
is still planar. Since G'p is a bipartite planar graph, preserving planarity it is 
easy to show that we can connect the white vertices among themselves by a tree 
of Wi? — 1 white white edges and that we can connect the black vertices among 
themselves by a tree of cp black black edges. Clearly, this implies that we can 
also add at least cp + wp — 1 new edges to Gp without destroying planarity. 
Hence, we need at least Cp + wp — 1 additional edges to triangulate the interior 
of F in the graph G. □ 

Property 7. If F\ and F 2 are two faces of G[B] with common boundary edge e, 
then tp^ + tp^ -1-1 equals tp, where we now consider {G — e)[B], and F is the 
face which results from merging Fi and F 2 when deleting e. 

Proposition 8. Suppose G = {B b) W, E) is a reduced plane black and white 
graph, with deg(u) > 3 for all u € W. Let F be a face of G[B]. Then, using the 
notation above, wp <tp. 

Proof. Consider a reduced black and white graph G = {B\SW, E) with deg(u) > 
3 for all u G W. If there is some u G W with deg(rt) > 4, then delete arbi- 
trarily all edges incident with u but four of them. While preserving the black 
induced subgraph, the resulting graph is still reduced, since no rules apply to 
white degree-4- vertices. Therefore, we can assume from now on without loss of 
generality that all white vertices of G have maximum degree of four. 

We will now show the claim by induction on the number of white vertices 
of degree four. The hardest part is the induction base, which is deferred to the 
subsequent Lemma 9. Assume that the claim was shown for each graph with 
< i and assume now that G has £-1-1 white degree-4- vertices. Choose some 
arbitrary u GW with deg(rt) = 4. Let {61 ,..., 64} be the clockwisely ordered 
neighbors of u. Due to planarity, we may assume further that {61,63} ^ E 
without loss of generality. Consider now G' = (G — m) -I- {61 , 63}. We prove below 
that G' (or G" = {G — u) + {62, 64} in one special case) is reduced. This means 
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that the induction hypothesis applies to G' . Hence, wp < tp for all faces in 
G'[B\. Observe that G' contains all the faces of G except from the face F of G 
which contains u~ F might be replaced by two faces F\ and F 2 with common 
boundary edge {61, 63}. In this case, wp^ < tp^^, wp^ < tp^, wp^ + wp^ + 1 = wp 
and, by Property 7, tpi + +1 = tp- Hence, wp < tp by induction. In the 

case where face F still exists in G' it is trivial to see that wp < tp. 

To complete the proof, we argue why G' has to be reduced. Obviously, this 
is clear if f/biiv € N{hi) deg(ri) = 4, since no reduction rules apply to degree-4- 
vertices. We now discuss the case that u has degree-3-vertices as neighbors. 

1. If a degree-3-vertex is neighbor of some hi, but not of bj, j 7^ i, then no 
reduction rule is triggered when constructing G' . 

2. Consider the case that a degree-3- vertex is neighbor of two bi,bj, i ^ j. We 
can assume that {i,j} 7^ {1,3}, since otherwise {62,64} ^ E and we could 
consider G" = {G — u) + {62,64} instead of G' with a argument similar 
to the case {i,j} = {1,3}. If {i,j} = {1,3}, then G' is clearly reduced. If 
{*, j} = {f,2} (or, more generally, |{i, j} C {1,3} = 1|), then no reduction 
rules are triggered when passing from G to Gb 

3. If a degree-3- vertex is neighbor of three 6^, bj, bk, then a reasoning similar to 
the one in the previous point applies. 

This concludes the proof of the proposition. □ 

The following lemma serves as the induction base in the proof of Proposi- 
tion 8. 

Lemma 9. Suppose G = (B b) W, E) is a reduced plane black and white graph, 
with deg(M) = 3 for all u € W. Let E he a face ofG[B], Then, using the notation 
above, wp < tp. 

Proof. (Sketch) Let us consider a fixed planar embedding of the graph G, and 
consider a face E of the black induced subgraph G[B]. Let Wp C W he the set 
of white vertices in the interior of E, and let Bp C B denote the black vertices 
on the boundary of F. We want to find at least \Wp\ many black black edges 
that can be added to G[B] inside F. For that purpose, define the set 

^poss _ ^ I 5^, 52 e A e ^ E{G[B])} 

of non-existing black black edges. 

For a subset W' C Wp we construct a bipartite graph F[{W') := {W U 
T{W'),E{W')) as follows. In H(W'), the first bipartition set is formed by the 
vertices W' and the second one is given by the set 

T{W') := {e = {61,62 } G | 3m G W' : e C Ng(u)}. 

The edges in H{W') are then given by 

G(W') := {{u,e} | m G W',e G T{W'),eC Ng{u)}. 
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bi 




Fig. 1. Illustration of a diamond D generated by a pair vertices {6i, ^>2} G T{Wp) 



In this way, the set T{W') gives us vertices in H{W') that correspond to pairs 
e = {61,62} of black vertices in Bp between which we still can draw an edge 
in G[B], Note that the edge e can even be drawn in the interior of F, since 61 
and 62 are connected by a white vertex in W' C Wp. This means that 

\T{Wp)\<tp. (11) 

Due to reduction rule (R7), for each u € Wp, the neighbors N{u) C Bp are 
connected by at most one edge in G[B\. By construction of H{Wp), we find: 

2 G Wp. (12) 

The degree deg^(,4?^)(e) for an element e = {61,62} G T{Wp) tells us how many 
white vertices share the pair {61, 62} as common neighbors. We do case analysis 
according to this degree. 

Case 1: Suppose degji^(^;^^p)(e) < 2 for all e € T{Wp), then H{Wp) is a bipartite 
graph, in which the first bipartition set has degree at least two (see Eq. (12)) 
and the second bipartition set has degree at most two. In this way, the second 
set cannot be smaller, which yields 

( 11 ) 

Wp = |1Ff| < |T(lFp)| < tp. 

Case 2: There exist elements e = {61, 62} in T{Wp) which are shared as common 
neighbors by more than 2 white vertices (i.e., deg^(i,i^^)(e) = m > 2). Suppose 
we have ui, . . . ,Um & Wp with Ng{ui) = {61, 62, Zt} (i.e., {ui, e} € E{Wp)). We 
may assume that the vertices are ordered such that the closed region D bounded 
by {61, Ml, 62, Mm} contains all other vertices M2, ... , Um-i (see Figure 1). 

We call D the diamond generated by {61, 62}. Note that D consists of m — 1 
regions, which we call blocks in the following; the block Di is bounded by 
{61, Mi, 62, Mi+i} (z = 1 , . . . , TO — 1). Let Wi C Wp, and Bi C Bp, respectively, 
denote the white and black, respectively, vertices that lie in Di . For the boundary 
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vertices {6i, 62, ui, - . . , Um} we use the following convention: 61, 62 are added to 
all blocks, i.e., 61, 62 G Bi for all z; and Ui is added to the region where its third 
neighbor Zi lies in. A block is called empty if Bi = {61,62} and, hence, Wi = 0. 
Moreover, let Wd ■= and Bjj '■= 

We only consider diamonds, where Z\ and Zm are not contained in D (see 
Figure 1). The other cases can be treated with similar arguments. 

Note that each block of a diamond D may contain further diamonds, the 
blocks of which may contain further diamonds, and so on. Since no diamonds 
overlap, the topological inclusion forms a natural ordering on the set of diamonds 
and their blocks. 

We can show the following claim by induction on the diamond structure: 

Claim: For each diamond D generated by {61,62}, we can add (where to > 
\Wd\) many black black edges to G[B] other than {61, 62}. All of these additional 
edges can be drawn inside D so that {61, 62} still can be drawn. 

Using this claim, we can finish the proof of the induction base of the propo- 
sition: Consider all diamonds which are not contained in any fur- 
ther diamond. Suppose U® has boundary {6{, 62, u(„.} with 6^,62 G Bp 

and G Wp- Let 

r 

W'p :=Wp\{\JWd.). 

i=l 

According to the claim we already found J2i=i many black black edges 
in inside the diamonds D\ Observe that each pair e* = {6},62} is only 

shared as common neighbors by at most two white vertices (namely, u\ and ul ^. ) 
in (sic!) W'p. Hence, the bipartite graph H{Wp) again has the property that 

• deg^(iy^)(e) < 2 for all e G T{Wp) and still 

• deg^(^^)(u) > 2 for all u G Wp^ 



Similar to “Case F’ this proves that — additionally — we find t' (with t' > 
many edges in Hence, 



Wp 



\Wp\ 



= \W'p\ + 






< + < tp. 

i=l 



□ 



Using Theorem 3 for the construction of a search tree as elaborated in Sec- 
tion 1, we conclude: 



Theorem 10. (Annotated) dominating set on planar graphs can be solved 
in time 0(8^n). □ 
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Fig. 2. A reduced graph with all black vertices having degree 7, thus showing 
the optimality of the bound derived in our branching theorem 



3.3 Optimality of the Branching Theorem 

We conclude this section by the observation that with respect to the set of 
reduction rules introduced above the upper bound in our branching theorem is 
optimal. More precisely, there exists a plane reduced black and white graph with 
the property that all black vertices have degree at least 7. Such a graph is shown 
in Figure 2. Moreover, this example can be generalized towards an infinite set of 
plane graphs with the property that all black vertices have degree at least 7. The 
given example is the smallest of all graphs in this class. It is an interesting and 
challenging task to ask for further reduction rules that would yield a provably 
better constant in the branching theorem. For example, one might think of the 
following generalization of reduction rule (R6): 

(R6’) If there are white vertices ui,U 2 G W with Na{ui) C No{u 2 ), then 
delete ui. 

However, the graph in Figure 2 is reduced even with respect to rule (R6’). 
We leave it as an open question to come up with further reduction rules such 
that the graph of Figure 2 is no longer reduced. 

4 Conclusion and Open Questions 

In this paper, we gave the first search tree algorithm proven to be correct for 
the DOMINATING SET problem on planar graphs. It improves on the original. 

Note that according to the claim the edges {bbfcl} still can be used. 



4 
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flawed theorem stating an exponential term 11^", which is now lowered to 8^. 
Unfortunately, the proof of correctness has become considerably more involved 
and fairly technical. 

Our work suggests several directions for future research: 

• Can we improve the branching theorem by adding further, more involved 
reduction rules? 

• Finding a so-called problem kernel (see [8] for details) of polynomial size 
p{k) for DOMINATING SET Oil planar graphs in time TK{n, k) would improve 
the running time to 0(8^ -|- TK{n^ k)) using the interleaving technique an- 
alyzed in [10]. Currently, we even hope for a linear size problem kernel for 
DOMINATING SET On planar graphs. 

• Since our results for the search tree itself are based on the Euler formula, a 
generalization to the class of graphs G{Sg) (allowing a crossing-free embed- 
ding on an orientable surface Sg of genus g) seems likely. 

Acknowledgment. We thank Klaus Reinhardt for discussions on the topic of 
this work and for pointing to an error in an earlier version of the paper. 
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Abstract. In this paper, we consider the help bit problem in the deci- 
sion tree model proposed by Nisan, Rudich and Saks (FOGS ’94). When 
computing k instances of a Boolean fnnction /, Beigel and Hirst (STOG 
’98) showed that [log 2 k\ -\-l help bits are necessary to reduce the com- 
plexity of /, for any function /, and exhibit the functions for which 
[logj fcj -I- 1 help bits reduce the complexity. Their functions must satisfy 
the condition that their complexity is greater than or equal to fc — 1. In 
this paper, we show new fnnctions satisfying the above conditions whose 
complexity are only 2y/k. We also investigate the help bit problem when 
we are only allowed to use decision trees of depth 1. Moreover, we exhibit 
the close relationship between the help bit problem and the complexity 
for circuits with a majority gate at the top. 



1 Introduction and Definitions 

Pick your favorite computational model to compute Boolean function. Let / be 
a Boolean function and let c be the complexity of / in the model. The help bit 
problem and the cover problem, which we will mainly deal with in this paper, 
are informally as follows. 

Help Bit Problem Suppose you wish to compute / on two inputs x and y, 
and are allowed one “help-bit”, i.e., an arbitrary function of the two inputs. Is 
it possible to choose this help-bit function so that, given the help-bit, f{x) and 
f{y) can each be computed with a computational complexity of less than c, and 
if so, by how much? How about computing f on k inputs with I help-bits? 
Cover Problem Suppose you wish to compute / on two inputs x and y giving 
two pairs of answers (a^,a^) and {a^,ay) such that at least one pair (o^,a^) 
computes {f{x),f{y)) for i G {1,2}. Does a complexity less than c ever suffice 
to compute an answer for an input? What if you wish to evaluate f on k inputs 
and you are allowed to give m answer tuples, generalizing the number of inputs 
and that of answer tuples? 

The notion of a help bit was first introduced by Cai[Cai89] in the context 
of constant depth circuits. Amir et al.[ABG00] studied general circuit models. 
Nisan et al.[NRS94] and Beigel and Hirst [BH98] investigated these two problems 
for the decision tree model. In this paper, with the same spirit, we consider these 
problems for the decision tree model of computation. There are formal definitions 
for the cover problem, following Beigel and Hirst [BH98]. 
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Definition 1. A decision tree over a variable set X is a triple (T,p,a) where 
T is a rooted ordered binary tree, p is a mapping from the internal nodes of T 
to the set X, and a is a mapping from the leaves ofT to {0, 1}. For a node v of 
T, the variable p{v) which is assigned to v is said to the label of v. 

A decision tree T over X computes a Boolean function f as follows: If a 
is any assignment in {0, 1}^, the computation of T on a is the unique path 
vq,vi, . . . ,Vs from the root vg ofT to some leaf Vs- Inductively, Vi+i for i>Q is 
defined to be the a{p{vi))th child ofvi. The output of the computation is given 
by a(vs), which is denoted by T{a). 

The depth of a decision tree T is the length of the longest path in T. DEPTH(d) 
is the class of Boolean functions computable by decision trees having at most 
depth d. 

Definition 2. A decision forest over a variable set X is an ordered tuple of 
decision trees over X . The forest F = (Ti , . . . , T^) computes the function 

F{ai,...,ak) = {Ti{a),. . . ,Tk{a)). 

The depth of the forest {T\, . . . ,Tk) is the maximum of the depths of T\, . . . ,Tk- 

Definition 3. A family of decision forests over a variable set X is a set of 
decision forests over X. The depth of a family of decision forests {J^i, . . . , Fk} 
is the maximum of the depths of F\, . . . ,F^. The size of a family of forests T is 
the number of forests in T . The width of a family of forests T is the number of 
trees in each forest of if. A family T of forests covers a function g if 

Va3i^ G T F{a) = g{a). 

When T covers g, we call T a cover for g. 

Definition 4. Let X be the union of mutually disjoint sets Xi, . . . , X^- For 
1 < J < fc; let fj be a function over Xj. Let 

X = {('Til, . . . ,Tife), . . . , (Tmi, . . - ,Tmk)} 

be a cover for (/i, . . . , fk). The cover T is called pure if, for any i,j, the nodes 
of the tree Tij are labeled only with a variable in Xj, impure otherwise. 

In this paper, we are mainly interested in covers for a tuple consisting of k 
isomorphic copies of the same function over the disjoint sets of variables. 

Definition 5. Let f be a Boolean function on n variables. We define 

Let PCoverd{m, k) be the class of Boolean functions f such that has a pure 
cover of size m, width k and depth d. Let Coverd{m, k) be the class of Boolean 
functions f such that has a cover of size m, width k and depth d. 
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By the definition, PCoverd(m, k) C Coverd(m, k) for any d, m and k. One 
may think that an impure family of forests cannot do better than a pure family, 
i.e., PCoverd(TO, fc) = Coverd(m, fc). However this question is open[BH98]. As 
Beigel and Hirst have pointed out[BH98], we think that this is the most impor- 
tant open problem concerning the cover problem in the decision tree model. The 
help bit problem and the cover problem are strongly related each other because 
it is known that a Boolean function / is computable with I help bits iff / has a 
cover of size 2^ [BH98] . 

A brute-force construction obviously shows that PCoverd(2^, fc) contains all 
the Boolean functions for every d. So it is natural to ask when a nontrivial 
cover exists. More precisely, for what pair of values m and k does DEPTH((i) C 
PCoverd(m, k) or DEPTH(d) C Coverd(m, k) hold? Concerning these problems, 
Beigel and Hirst [BH98] proved the following strong statements. 

Vd Vfc DEPTH(d) = PCoverrf(fc, k) = Coverd(fc, k), 

V/c > 2 Vd > fc - 1 DEPTH(d) C PCoverd(fc -k 1, k). 

The first claim says that [log 2 k\ help bits do not help to evaluate a function 
on k inputs. The second claim says that the first claim is tight in the sense that 
increasing the size /c of a family of forests properly enlarges the set of functions 
covered, provided that the complexity of the functions is greater that or equal 
to A: — 1. In Section 2.2 of the paper, we extend the second result to 

V/c > 2 Vd > 2Vk DEPTH(d) C PCoverd(/c + l,k). 

In this paper, we also consider the computational power of a family of forests 
of depth 1. Let (respectively, Td{k)) be the minimum m such that 

DEPTH(d) C PCoverd(m, k) (respectively, DEPTH(d) C Coverd(m, k)) holds. 
The values of T^^'^^ik) or Td(fc) were known only for an elemental case such as 
rf“’’®(3) = 5[BH98]. In Section 2.1 of the paper, we determine the asymptotic 
values of Tf’“’’®(fc) and of ri(fc) and the exact values of T/^“’’'^(fc) for fc < 6. 

Finally, in Section 3, we exhibit the close relationship between the cover 
problem and the complexity for circuits with a majority gate at the top. 

2 Cover Problem 

In this section, we discuss the computational power of a family of depth-d forests 
for a relatively small d. 

Before going into detail, we show an example of covers. The following example 
was first suggested by Blum which appeared in [NRS94]. For more examples of 
covers, see [BH98]. 

Example 6. The majority function on 3 variables, denoted MAJ 3 (x, j/, z), is de- 
fined as 



MAJ3(x,y, z) 



1 ifx-l-y-l-z>2, 
0 otherwise. 
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Let Xi = {x\,yi,Zi}, X 2 = {X2, 2/2, -22} and X = X\ U AT2. Then the family of 

rol 

forests {{xi,X 2 ), (2/I) 2/2), (^1,-^2)} covers MAJg h This can be easily seen because 
at most one of the variables Xi, yi and Zi can be unequal to MAJ3(a:i, 2/i, ^i). 
In what follows, when discussing pure covers we will drop the subscript that 
distinguishes variables from different instances. Accordingly, we represent the 
cover above as {{x,x), (y,y), {z, z)}. 

2.1 Depth 1 

Recall that (respectively, Td{k)) denotes the minimum m such that 

DEPTH((i) C PCoverd(m, k) (respectively, DEPTH(d) C Coverd(m, k)) holds. 
One of the most fundamental problems on covers is to determine the values of 
and Td{k) for various values of d and k. These questions seem to be 
difficult even if we restrict ourselves to d = 1. All the results concerning the 
problem obtained so far are t/^“’’®(1) = 2, t/^“’'®( 2) = 3 and rf“’’®(3) = 5. (See 
the technical report version of [BH98]). In this subsection, we determine the 
asymptotic values of T/^“’’®(fc) and Ti(fc) and the exact values of t/^“’'®(A:) for 
fc < 6. 

For a Boolean function / and for an integer d > 0, the agreement probabil- 
ity between / and DEPTH(d), denoted p{f,d), is defined to be the maximum 
real number p such that there is a distribution T on DEPTH(d) that satisfies 
VaPrTer[/(«) = T{a)] > p. Nisan et al.[NRS94] obtained the following theorem 
which gives a relationship between p(d) and Td{k). 

Theorem 7 ([NRS94]). For any d> 1, 

( ^ 

\raa,Xf^UEPTH(d) p{f, d) 

By virtue of this theorem if we can prove an upper bound on the agreement 
probability then we get the corresponding lower bound on Td{k). 

Theorem 8. For any f ^ DEPTH{1), p{f, 1) < 2/3. 

Proof. Let / be a function that cannot be computed by a decision tree of depth 1. 
Let n be the number of input variables of /. Since / is not constant, there are two 
strings a G {0, 1}*“^ and b G {0, 1}”“* such that f{a0b) yf /(old) holds. Assume, 
in contradiction, that p(/, 1) > 2/3 holds, i.e., there exists a distribution T on 
DEPTH(l) such that, for any a G {0,1}", Pr7’g7-[T(a) = /(a)] > 2/3 holds. 
Without loss of generality we can assume f{a0b) = 0 and /(old) = 1. Then 
PrTer[T"(a06) = 0] > 2/3 and PrTer[^’(al^) = 1] > 2/3 holds. It is obvious 
that Xi is the only tree of depth 1 such that the output of it is changed from 0 
to 1 when we change the input from a06 to alb. It is easily seen that to satisfy 
the above formulae, the probability of the tree Xi in T must greater than 1/3. 
But since the / is not the function Xi, there exists an input l3 on which / and 
Xi output different values. Hence PrTer[T"(/3) = fiP)] < 1 — 1/3 = 2/3. This 
contradicts the assumption and completes the proof. □ 



< Td{k) < rr^^ik). 
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Corollary 9. For any k > 1, we have 1.5* < Ti{k) < □ 

For a pure family of forests, we can obtain a slightly stronger statement. 
Corollary 10. For any k>l, we have |’1.5t/^“’'®(A: — 1)] < □ 

For an upper bound on Ti(k), we can show the following. 

Theorem 11. For sufficiently large k, 

Ti{k) < < (l + fc)1.5*. 

Proof (Sketch). First we observe that p{x V y, 1) = 2/3 because at least 2 of 
the decision trees 1, x, y are equal to x V y for every assignment. Thus as in 
the construction of a cover for the majority function described in Section 8.2 
in [BH98], we can obtain a cover of depth 1 and size (1 + k){l/p{x V y, 1))* for 
{x V y)[*l by using Lovasz’s theorem on fractional covers [Lov75]. □ 

In view of Corollary 9 and Theorem 11, it is natural to expect that there is 
a constant c{d) such that limfc_>oo(Td(fc))^^* — >■ c{d) for d> 1. We succeeded to 
prove c(l) = 1.5. In the next subsection, we will prove that p{x\ V X 2 X 3 , 2) > 4/5 
which implies c(2) < 1.25. We conjecture that c(2) = 1.25, but the exact value 
of c(d) for d >2 remain to be shown. 

In the rest of this subsection, we exhibit that the lower bound stated in 
Corollary 10 is optimal for k < 6 . (The optimum for k < 3 is shown implicitly 
in [BH98].) The following table gives exact values of for a small k: 

fc=123456 7 8 

= 2 3 5 8 12 18 27 ~ 31 41 ~ 54 

Theorem 12. For any k < 6 , = |’1.5rf’“’'®(A: — 1)] . 

Proof. To prove the theorem, it is sufficient to construct pure families of depth 1 
with sizes given in the above table that cover a function that cannot be computed 
by any decision tree of depth 1. We choose the function a; V y as such a function. 
Figure 1 gives the pure covers for (xVy)[*l where /c = 4,5 and 6. We found them 
after a large number of trials and errors. Unfortunately, we could not generalize 
our construction to generate an optimal cover for larger values of k. □ 

Before closing this subsection, we give another interesting construction of a 
cover for the function (xV y)t*l which implies the upper bounds on Tj*'“’’®( 7) and 
rf“’’®(8) in the above table. 

Let T be a decision tree and S = {(Tn, . . . ,Tik), • ■ • , (Tjni, . . . ^Tmk)} be a 
family of forests. Let 

T o S = {(T, Til, • ■ • , Fife), . . . , (T, Tmi, . . . , Tmfe)}, 

S\t = {(F, Fi 2, . . . , Fife), . . . , (F, Tm 2 , ■ ■ ■ , Fmfe)}. 
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We define the family of forests recursively as follows. 

52 = {(1,1)}, 52 = {(x,y)}, Sl = {{y,x)}, 

5^l = lo5^ = xoS^VJxoSl S^+^ = yoS%VJyoS%, (for A: > 2) 

and = S^\J S^\J for A: > 2. It is easy to see that |5^| = Fk +2 where Fn 
is the Fibonacci number, i.e., Fi = F 2 = 1 and F„ = Fn-i + Fn- 2 {n > 3). An 
easy but tedious argument, which we omitted in this extended abstract of the 
paper, can show that the families cover {x V y)^^^ for any k >2. 



2.2 Depth d 

In this subsection, we extend the theorem on the computational power of a family 
of decision forests of width k and size A; + 1 by proving the following. 

Theorem 13. VA:Vd > 2Vk DEPTH{d) C PCoverd{k + 1, A:). 

The key of the proof is to find a function / such that / cannot be computed 
by any decision tree of depth d and that the agreement probability between / 
and DEPTH((i) is high. The agreement probability between OR of d+l variables 
and DEPTH(d) is 1 — l/(d+ 1) has been shown[BH98]. Our function, which will 
be defined below, has a higher agreement probability given by 1 — 0{\/d'^). 

Definition 14. For d > 0, let ALTd be the Boolean function on d+1 variables 
defined as follows. 



ALTo{xi) = xi, 



ALTd{xi,X2, ■ ■ -,Xd+i) 



xi ■ ALTd-i{x2,X3, . . .,Xd+i), if d is odd; 
xi V ALTd-i{x 2 , X3, . . . , Xd+i), if d is even. 
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0 1 0 1 
Fig. 2. The function ALTd(Left:d is odd, Right:d is even) 



The decision trees to compute the function ALT^ are shown in Figure 2.2. 
Since the number of satisfying assignments to ALT^; is odd and the number of 
satisfying assignments to a function on d + 1 variables that can be computed by 
decision trees having depth d or less is even, the function ALT^ is in DEPTH(d+ 
1)— DEPTH(d). For d > 0, we define the set Sd of vectors with length d + 1 
recursively as follows: 



^0 = { 0 , 1 }, 

r OS'd-i u I f = 0, 1, . . . , (d - 1)/2}1 if d is odd; 

\ OS'd-i U I z = 0, 1, . . . , d/2} if d is even. 

An easy calculation shows the following. 

Fact 15. |S'd| = (d^ + 4d + 8)/4 for even d and lAdl = (d^ + 4d + 7)/4 for odd 
d. □ 

Let / be a Boolean function over X and v € {0, 1}^ be an input to /. Let 
/©{uj denote the function such that (/®{w})(z’) f{v) and (/®{w})(x) = f{x) 
for any x ^ v. In what follows, we will prove that, for each v € Sd, the function 
ALTd ® {uj can be computed by a decision tree of depth d. 

Lemma 16. For any d and for any v € Sd, there exists a decision tree of depth 
d or less that computes the function ALTd ® {u|. 

Before proceeding to the proof of Lemma 16, we show a technical lemma that 
will be used in the proof of Lemma 16. 

Lemma 17. For k > 0 and for a = aia 2 • • • Ofc G {0, 1}^, the function Fa on 
2fc + 1 variables is defined as follows. 
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F^{xi, . . .,X2k+l) = Xiix^^ V X3I V V X5" V • • • V V 

V XiX2(x3 V ^3X4X5 V X3X4X5X6X7 V • • • V X3X4 • • • X2fcX2fe+l), 

where x^ denotes x and x° denotes x. In particular, Fij,(xi) is defined to be the 
constant 0. If ai > a 2 >■■■> ctk holds, then there is a decision tree of depth 
2k which computes Fa- 

Proof. The proof proceeds by induction on k. The base, A: = 0, is trivial. Assume 
the induction hypothesis holds for up to fc — 1. 

Case 1) oi = 02 = • • • = = 1- Since 

Fa\xi =0 = X 2 (X 3 V X3X4X5 V • • • V X3X4 • • • X2kX2k+l), 



and 

Fa\xi=l = X2 V X3 V • • • V X2fe+1, 

the decision tree in Figure 3(a) computes and in Figure 3(b) computes 

Fa\xi=i- Thus the function Fa is computed by the decision tree in Figure 3(c). 






Fig. 3. The construction of the decision trees for Fa- 



Case 2) ak = 0. We have 

Fa\x2k+l=0 = Xi V X1X2(X3 V X3X4X5 V • • • V X3X4 • • • X2fc-2X2fc-l), 
Fa\x2k+l = l,X2k=0 = Xi V X1X2(X3 V X3X4X5 V • • • V X3X4 • • • X2fc-2), 
Fa\x2k+i = l,X2k = l — Xi{X2^ V V • • • V X2J,_2 V X2^_j) 

V X1X2(X3 V X3X4X5 V • • • V X3X4 • • • X2fc-2X2fc-l) 
Faia 2 ‘“Ctk — l ^X±, X2 , • . . , X2fc — 1). 
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Since Fa\x 2 k+i=o depends on only 2fc — 1 variables, it can be computed by a 
decision tree of depth 2k — 1. Similarly, since Fa\x 2 k+i='i-,x 2 k=(> depends on only 
2k — 2 variables, it can be computed by a decision tree of depth 2k — 2. By 
the induction hypothesis, Fa\x 2 k+i=i,x 2 k=i de computed by a decision tree 
of depth 2k — 2. Hence we can easily construct a decision tree of depth 2k by 
attaching X 2 k+i to its root, which computes the function Fa. □ 

Now we proceed to the proof of Lemma 16. 

Proof (of Lemma 16). The proof is by induction on d. The base, d = 0 is trivial. 
Assume the induction hypothesis holds for up to d — 1. 

Case 1) d is even. Put d = 2k. It is sufficient to show that, for every v £ Sd = 
0S'd_iUl{(00)*(ll)^“* I f = 0, 1, . . . , /c}, we can construct a decision tree of depth 
2k which computes the function ALT^; © {u}. For v G 0S'd_i, the induction step 
follows immediately from the induction hypothesis since ALTd| 2 ,j=o = ALTd_i. 
In what follows, let v = 1(00)*(11)^“* {i G {0, 1, . . . , /c}). Let be the mask 
function defined by Gy{v) = 0 and Gy{x) = 1 for any x ^ v. Since ALT^ outputs 
1 on the input v, the function ALTd©{u} is equivalent to ALT^-Gt,. The function 
Gy can be expressed as 



Gy=xi\/ X2^ V Xgi V CC4" V V • • • V x^f V 

where oi = 02 = • • • = Oi = 1 and Oi+i = • • • = = 0. A simple calculation 

shows that the function ALT^ • Gy is equivalent to Fa^^...ak defined in Lemma 
17. Thus, the induction step follows from Lemma 17. 

Case 2) d is odd. Put d = 2k + 1. We shall construct a decision tree of depth 
2fc + 1 which computes ALT^ © {u} for each v £ Sd = OSd-i U 1{(00)*(11)^“*1 | 
i = 0, 1, . . . , k}. For v £ OSd-i, the induction step follows immediately from the 
induction hypothesis since ALTd|a,j=o = ALTd_i. It is easy to observe that the 
negation of the function ALT^ can be represented as 



ALTd = a;i V X1X2X3 V • • • V ccia ;2 • • • Xd-2Xd-i V X1X2 ■ ■ ■ XdXd+i. 

This function is equal to the function ALT(^_i_i(a;i, a; 2 , . • . , Xd+i, 1). By Lemmal7 
and by similar arguments to the arguments in the proof of Case 1, we can 
construct a decision tree of d + 1 which computes the function ALT^+i © {«'} 
where v' = 1(00)*(11)^“*“''^. Since the label of the root of the decision tree 
constructed in the proof of Lemma 17 is Xd+2, the right son of the root of this 
tree, whose depth is d, computes the function ALT^ © {u}. □ 

We are now ready to prove Theorem 13. 



Proof (of Theorem 13). To prove the theorem, it is sufficient to show that 
ALTW(a;i,x 2 , . . . ,Xd+i) is in PCoverd(fc + l,fc). By Lemma 16, there are trees 
Ti,T 2 , . . . ,T\Sd\ compute the function ALT^ © {w} for each v £ Sd. Since 
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d > 2y/k, {d? +4)/A> k+1 holds. By Fact 15, we have |5'd| > + 4d+ 7)/4 > 

{d^ + 4)/4 > A: + 1. Define the family of forests T as 

r ( Ti, Ti, Ti, •••, Ti, ),] 

I ( T2, T2, T2, •••, T2, ), I 

i ( Tk+i, Tfc+i, Tk+i, ■■■ , Tk+i ) ^ 



It is easy to see that T covers ALT|;^^ This is because, for each column in T, at 
most one of the decision trees can be unequal to ALT^. □ 



3 Relation between Cover and Circnit Complexity 

In this section, we reveal that there is a close relation between the cover problem 
and the complexity for circuits with a majority gate at the top. Let / be a 
Boolean function whose domain is D and C be any class of Boolean functions 
over the same domain. The correlation between / and C, denoted is 

defined as 

7 (/,C) = nun max Pr [/(a) = h{a)], 

T> h^C oc^D 

where V denotes a distribution on D. The correlation between / and C is closely 
related to the complexity of / for circuits whose top is a majority gate. 

Theorem 18 ([Fte95,GHR92]). Let f be a Boolean function over {0, 1}" and 
C he a set of functions over the same domain. If k > (2nln2)(7(/, C) — 1/2)“^, 
then f can he represented as f{x) = MAJ{hi{x), . . . ,hk{x)) for some hi G C. 
If f can he represented as f{x) = MAJ{hi{x), . . . ,hk{x)) with hi & C then 
(7(/,C)-l/2)>l/fc. □ 

The proofs of many of our theorems in the last section rely on the analysis 
of the agreement probability between a target function and a class of decision 
trees. Now we generalize the definition of the agreement probability to cover 
other complexity classes. For a Boolean function / and a complexity class C, the 
agreement probability between / and C, denoted p{f,C), is defined as 

p{f,C) = maxmin^P^[/(a) = h{a)], 

where T denotes a distribution on C. Below we show that the correlation and 
the agreement probability are identical for any function and any computational 
class by employing the minimax theorem. 

Theorem 19. For any Boolean function f and any complexity class C, p{f, C) = 
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Proof. We have 

p{f,C) = maximn Pr \f{a) = h{a)] 

T T> hGT,a^T> 

= ™nmax Pr \f{a) = h{a)] = j{f,C) 

T> T h£T,oc£T> 

The first and the third equalities follow from a simple calculation and the second 
equality follows from the minimax theorem for mixed strategies for two person 
matrix games. □ 

By using Theorems 18 and 19, we can prove the following theorem. This says 
that if k independent copies of a function f on n variables can be computed by 
decision forests of depth d using k—1 help bits, i.e., if we can save one help bit, 
then / can be represented as the majority of 0{nk^) functions in DEPTH(d) 
and an approximate converse to this also holds. 

Theorem 20. Let f be a function on n variables, (i) If f G Coverdi^^ k) then 
f can be expressed as MAJ{h \, . . . , /i20nfc2) for some hi G DEPTH{d) . (ii) If f ^ 
PCoverd{‘2^ f‘2, k) then f cannot be expressed as MAJ{h\, . . . , /i2fc/(in(4nfcin2))) for 
any hi G DEPTH(d). 



Proof (Sketch). Let / be a function on n variables. Put p{f,d) = 1/2 + 1/z. 
(i) Suppose that / G Coverd(2^/2, k). We have k > {{z + 2) ln2)/2 since if not, 
then \/p{f,d)^ > 2^/2 which implies / ^ Coverj;(2*/2, fc) by Lemma 4.1 in 
[NRS94]. Thus 2 < 2/cln2 — 2. The statement (i) in the theorem immediately 
follows from Theorems 18 and 19. (ii) Suppose that / ^ PCover(2^/2). We have 
k< (z+2)(ln(4nA: In 2))/2 since if not, then nfcln 2/p(/, < 2^/2 which implies 

/ G PCoverd(2^/2, k) by Lemma 4.1 in [NRS94]. Thus z > 2fc/(ln(4nA:ln2)). By 
Theorems 18 and 19, we can easily obtain the statement (ii) in the theorem. □ 

Finally, we remark that the above theorem can be applied to other compu- 
tational models. For example, we may obtain a good lower bound for threshold 
circuits of depth 3 by investigating covers consisting of threshold circuits of depth 
2 instead of decision trees. 
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